multimodal-art-projection / IV-Bench
☆10Updated last week
Alternatives and similar repositories for IV-Bench:
Users that are interested in IV-Bench are comparing it to the libraries listed below
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆15Updated 2 months ago
- Multimodal RewardBench☆38Updated 2 months ago
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆47Updated last year
- Code for Retrieval-Augmented Perception (RAP)☆10Updated 2 months ago
- ☆30Updated 9 months ago
- MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale☆41Updated 4 months ago
- ☆10Updated 6 months ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆44Updated 10 months ago
- Preference Learning for LLaVA☆44Updated 5 months ago
- A Comprehensive Benchmark for Robust Multi-image Understanding☆10Updated 8 months ago
- Official code of *Towards Event-oriented Long Video Understanding*☆12Updated 9 months ago
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".☆32Updated last year
- Source code for the paper "Prefix Language Models are Unified Modal Learners"☆43Updated 2 years ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆47Updated last month
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆22Updated last month
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆25Updated 7 months ago
- ☆40Updated 4 months ago
- ☆28Updated last month
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding☆24Updated this week
- ☆51Updated last year
- ☆18Updated 9 months ago
- This repo contains code and data for ICLR 2025 paper MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs☆31Updated last month
- [ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"☆13Updated last year
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]☆65Updated 2 weeks ago
- ☆99Updated last year
- [NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs☆18Updated 6 months ago
- Code for 'Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality', EMNLP 2022☆30Updated last year
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆53Updated last week
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆18Updated 10 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆55Updated 6 months ago