Wang-ML-Lab / multimodal-needle-in-a-haystack
[NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models
☆41Updated last month
Alternatives and similar repositories for multimodal-needle-in-a-haystack:
Users that are interested in multimodal-needle-in-a-haystack are comparing it to the libraries listed below
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆65Updated 10 months ago
- Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"☆43Updated last month
- V1: Toward Multimodal Reasoning by Designing Auxiliary Task☆28Updated this week
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆62Updated 6 months ago
- Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"☆36Updated last month
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆42Updated 9 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆42Updated last month
- ☆43Updated this week
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆54Updated 5 months ago
- ☆54Updated last year
- Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.☆32Updated last year
- (ICLR2025 Spotlight) DEEM: Official implementation of Diffusion models serve as the eyes of large language models for image perception.☆27Updated last month
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆56Updated this week
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆42Updated 5 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆48Updated 3 weeks ago
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆15Updated last month
- Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…☆25Updated last month
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆76Updated 2 months ago
- ☆16Updated this week
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆75Updated 7 months ago
- ☆83Updated last week
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆70Updated 4 months ago
- Code for Heima☆40Updated 2 months ago
- Preference Learning for LLaVA☆43Updated 5 months ago
- ☆71Updated 3 months ago
- Code for "Reasoning to Learn from Latent Thoughts"☆89Updated 2 weeks ago
- ☆27Updated last year
- ☆38Updated 3 months ago
- ☆50Updated last year
- Open-Pandora: On-the-fly Control Video Generation☆33Updated 4 months ago