FudanDISC / weakly-supervised-mVLP
Implementation of our ACL2023 paper: Unifying Cross-Lingual and Cross-Modal Modeling Towards Weakly Supervised Multilingual Vision-Language Pre-training
☆15Updated last year
Related projects ⓘ
Alternatives and complementary repositories for weakly-supervised-mVLP
- ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities☆33Updated 2 months ago
- ☆28Updated last year
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆32Updated last week
- [ICLR 2023] This is the code repo for our ICLR‘23 paper "Universal Vision-Language Dense Retrieval: Learning A Unified Representation Spa…☆48Updated 4 months ago
- a multimodal retrieval dataset☆22Updated last year
- Source code and data used in the papers ViQuAE (Lerner et al., SIGIR'22), Multimodal ICT (Lerner et al., ECIR'23) and Cross-modal Retriev…☆27Updated 10 months ago
- EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions☆16Updated 5 months ago
- ☆32Updated last year
- [ICML 2022] Code and data for our paper "IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages"☆49Updated last year
- Code and data for ImageCoDe, a contextual vison-and-language benchmark☆39Updated 8 months ago
- The SVO-Probes Dataset for Verb Understanding☆31Updated 2 years ago
- ☆10Updated 2 months ago
- Official repository for the A-OKVQA dataset☆64Updated 6 months ago
- ☆25Updated 2 weeks ago
- MoCLE (First MLLM with MoE for instruction customization and generalization!) (https://arxiv.org/abs/2312.12379)☆29Updated 7 months ago
- ☆17Updated 4 months ago
- ☆15Updated 2 years ago
- [ACL 2024] FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model☆10Updated 2 months ago
- [NeurIPS 2023] A faithful benchmark for vision-language compositionality☆70Updated 9 months ago
- Code for our EMNLP-2022 paper: "Towards Robust Visual Question Answering: Making the Most of Biased Samples via Contrastive Learning"☆12Updated last year
- ☆24Updated last year
- ☆16Updated last year
- [ECCV'24] Official Implementation of Autoregressive Visual Entity Recognizer.☆10Updated 8 months ago
- MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning☆133Updated last year
- This is the first released survey paper on hallucinations of large vision-language models (LVLMs). To keep track of this field and contin…☆47Updated 3 months ago
- NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings☆53Updated 5 months ago
- ☆13Updated last year
- Retrieval-augmented Image Captioning☆12Updated last year
- On the Effectiveness of Parameter-Efficient Fine-Tuning☆38Updated last year
- [NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…☆43Updated 3 months ago