facebookresearch / CausalVQALinks
We introduce CausalVQA, a benchmark dataset for video question answering (VQA) composed of question-answer pairs that probe models’ understanding of causality in the physical world.
☆42Updated 2 weeks ago
Alternatives and similar repositories for CausalVQA
Users that are interested in CausalVQA are comparing it to the libraries listed below
Sorting:
- An open source implementation of CLIP (With TULIP Support)☆162Updated 3 months ago
- ☆182Updated 10 months ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆137Updated last year
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆210Updated 2 weeks ago
- Official Implementation of LaViDa: :A Large Diffusion Language Model for Multimodal Understanding☆137Updated last month
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆210Updated 7 months ago
- Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).☆179Updated 4 months ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆162Updated 8 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆107Updated last month
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction☆129Updated 5 months ago
- Official Implementation for our NeurIPS 2024 paper, "Don't Look Twice: Run-Length Tokenization for Faster Video Transformers".☆222Updated 5 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆69Updated last year
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆307Updated 3 months ago
- Official implementation of the Law of Vision Representation in MLLMs☆163Updated 9 months ago
- [ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆138Updated last year
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆146Updated 9 months ago
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆186Updated 2 months ago
- [Fully open] [Encoder-free MLLM] Vision as LoRA☆333Updated 2 months ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆120Updated 4 months ago
- ☆218Updated 3 weeks ago
- PyTorch Implementation of Object Recognition as Next Token Prediction [CVPR'24 Highlight]