Letian2003 / C-VQAView external linksLinks
Counterfactual Reasoning VQA Dataset
☆27Nov 23, 2023Updated 2 years ago
Alternatives and similar repositories for C-VQA
Users that are interested in C-VQA are comparing it to the libraries listed below
Sorting:
- Project for SNARE benchmark☆11Jun 5, 2024Updated last year
- Benchmarking Multi-Image Understanding in Vision and Language Models☆12Jul 29, 2024Updated last year
- ☆12Mar 8, 2021Updated 4 years ago
- [CVPR 23] Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!☆17May 14, 2024Updated last year
- [NeurIPS'24 spotlight] MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning. [TPAMI'25] MECD+☆45Oct 28, 2025Updated 3 months ago
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆19Jun 27, 2024Updated last year
- vqa drived by bottom-up and top-down attention and knowledge☆14Nov 21, 2018Updated 7 years ago
- [CVPR 2024 CVinW] Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering☆20Sep 21, 2024Updated last year
- [ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer☆37Oct 18, 2023Updated 2 years ago
- [ICCV 2023] With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning.☆19Jun 7, 2024Updated last year
- ☆20May 3, 2025Updated 9 months ago
- ☆85Dec 4, 2022Updated 3 years ago
- ☆20Oct 21, 2022Updated 3 years ago
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆60Apr 8, 2024Updated last year
- ☆25Jan 27, 2024Updated 2 years ago
- Can 3D Vision-Language Models Truly Understand Natural Language?☆20Mar 28, 2024Updated last year
- Repo for ICCV 2021 paper: Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering☆28Jul 1, 2024Updated last year
- [ECCV'22 Poster] Explicit Image Caption Editing☆22Nov 30, 2022Updated 3 years ago
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Sep 21, 2023Updated 2 years ago
- [CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention☆61Jul 16, 2024Updated last year
- [NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training☆26Dec 5, 2023Updated 2 years ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆26Oct 17, 2024Updated last year
- Repo for the EMNLP 2023 paper "A Simple Knowledge-Based Visual Question Answering"☆25Dec 14, 2023Updated 2 years ago
- TallyQA: Answering Complex Counting Questions dataset☆29Feb 19, 2024Updated last year
- [ECCV2022] Dense Siamese Network for Dense Unsupervised Learning☆29Jul 21, 2022Updated 3 years ago
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆61Jul 26, 2024Updated last year
- Code for Greedy Gradient Ensemble for Visual Question Answering (ICCV 2021, Oral)☆27Mar 28, 2022Updated 3 years ago
- (NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights☆28Oct 28, 2024Updated last year
- ☆27Oct 7, 2021Updated 4 years ago
- A Unified Framework for Video-Language Understanding☆61Jun 17, 2023Updated 2 years ago
- [CVPR 2025] COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training☆38Mar 27, 2025Updated 10 months ago
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆25Nov 23, 2024Updated last year
- [ICLR 2026] "Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models"☆48Feb 4, 2026Updated last week
- CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations☆29Oct 27, 2023Updated 2 years ago
- VisualGPTScore for visio-linguistic reasoning☆27Oct 7, 2023Updated 2 years ago
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆37Aug 18, 2024Updated last year
- ☆32Mar 25, 2024Updated last year
- An Enhanced CLIP Framework for Learning with Synthetic Captions☆39Apr 18, 2025Updated 9 months ago
- The official implementation of paper "Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval" accepted by NeurIPS…☆27May 14, 2024Updated last year