BAAI-WuDao / WuDaoMMLinks

WuDaoMM this is a data project

☆74

Alternatives and similar repositories for WuDaoMM

Users that are interested in WuDaoMM are comparing it to the libraries listed below

Sorting:

MUGE-2021 / image-caption-baseline
☆66Updated last year
zhanxlin / Product1M
Product1M
☆89Updated 3 years ago
rucmlcv / Wenlan-Video-Public
☆19Updated 3 years ago
ksOAn6g5 / TaiSu
TaiSu（太素）--a large-scale Chinese multimodal dataset（亿级大规模中文视觉语言预训练数据集）
☆191Updated 2 years ago
yuxie11 / R2D2
☆168Updated 2 years ago
MUGE-2021 / image-retrieval-baseline
☆59Updated 3 years ago
KwaiKEG / Kuaipedia
the world's first large-scale multi-modal short-video encyclopedia, where the primitive units are items, aspects, and short videos.
☆64Updated last year
chuhaojin / BriVL-BUA-applications
Bling's Object detection tool
☆56Updated 2 years ago
billjie1 / Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
☆169Updated 3 years ago
RUC-AIMind / TikTalk
☆70Updated 5 months ago
li-xirong / coco-cn
Enriching MS-COCO with Chinese sentences and tags for cross-lingual multimedia tasks
☆207Updated 9 months ago
Weili-NLP / UNIMO
UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning
☆70Updated 4 years ago
Junction4Nako / mvp_pytorch
pytorch implementation of mvp: a multi-stage vision-language pre-training framework
☆34Updated 2 years ago
BAAI-WuDao / BriVL
Bridging Vision and Language Model
☆285Updated 2 years ago
MUGE-2021 / image-generation-baseline
☆32Updated 3 years ago
microsoft / M3P
Multitask Multilingual Multimodal Pre-training
☆71Updated 2 years ago
zengyan-97 / CCLM
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training (ACL 2023))
☆92Updated 2 years ago
thu-ml / zh-clip
☆72Updated 2 years ago
benywon / ChiQA
The implementations of various baselines in our CIKM 2022 paper: ChiQA: A Large Scale Image-based Real-World Question Answering Dataset f…
☆33Updated last year
alibaba / EssentialMC2
EssentialMC2 Video Understanding.
☆114Updated 3 years ago
ShannonAI / OpenViDial
Code, Models and Datasets for OpenViDial Dataset
☆132Updated 3 years ago
chuhaojin / WenLan-api-document
The Document of WenLan API, which was used to obtain image and text feature.
☆41Updated 2 years ago
zerovl / ZeroVL
[ECCV2022] Contrastive Vision-Language Pre-training with Limited Resources
☆45Updated 3 years ago
OFA-Sys / OFA-Compress
OFA-Compress is a unified framework which provides OFA model finetuning, distillation and inference capabilities in Huggingface version, …
☆29Updated 3 years ago
Junya-Chen / FlatCLR
FlatNCE: A Novel Contrastive Representation Learning Objective
☆89Updated 4 years ago
hrlinlp / cepsum
☆44Updated 3 years ago
jd-aig / JAVE
☆88Updated 5 years ago
intersun / LightningDOT
source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT
☆72Updated 3 years ago
yangjianxin1 / OFA-Chinese
transformers结构的中文OFA模型
☆136Updated 2 years ago
MichaelZhouwang / VLUE
This repo contains codes and instructions for baselines in the VLUE benchmark.
☆41Updated 3 years ago