OpenGVLab / MM-NIAHLinks

[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.

☆117

Alternatives and similar repositories for MM-NIAH

Users that are interested in MM-NIAH are comparing it to the libraries listed below

Sorting:

Liuziyu77 / MMDU
Official repository of MMDU dataset
☆98Updated last year
FreedomIntelligence / ALLaVA
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆277Updated last year
RUCAIBox / Virgo
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
☆109Updated 6 months ago
OpenGVLab / MMT-Bench
[ICML 2024] | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
☆115Updated last year
RLHF-V / RLHF-V
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
☆298Updated last year
TideDra / VL-RLHF
A RLHF Infrastructure for Vision-Language Models
☆187Updated last year
MMStar-Benchmark / MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
☆200Updated last year
LengSicong / MMR1
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources
☆210Updated 2 months ago
open-compass / MMBench
Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"
☆272Updated 6 months ago
TIGER-AI-Lab / Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024]
☆234Updated 8 months ago
HZQ950419 / Math-LLaVA
Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
☆92Updated last year
njucckevin / MM-Self-Improve
A Self-Training Framework for Vision-Language Reasoning
☆87Updated 10 months ago
OpenGVLab / V2PE
[ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
☆58Updated 11 months ago
DAMO-NLP-SG / multimodal_textbook
[ICCV 2025 Highlight] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"
☆179Updated 8 months ago
vlf-silkie / VLFeedback
☆100Updated last year
zai-org / LVBench
[ICCV 2025] LVBench: An Extreme Long Video Understanding Benchmark
☆127Updated 4 months ago
MME-Benchmarks / MME-RealWorld
✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
☆146Updated last month
TIGER-AI-Lab / VL-Rethinker
The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]
☆168Updated 6 months ago
thunlp / Muffin
☆66Updated last year
RifleZhang / LLaVA-Hound-DPO
☆155Updated last year
bronyayang / Law_of_Vision_Representation_in_MLLMs
[COLM'25] Official implementation of the Law of Vision Representation in MLLMs
☆170Updated 2 months ago
zai-org / CogCoM
☆215Updated last year
RifleZhang / LLaVA-Reasoner-DPO
☆102Updated 10 months ago
BAAI-DCAI / DataOptim
A collection of visual instruction tuning datasets.
☆76Updated last year
zwq2018 / Multi-modal-Self-instruct
The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…
☆84Updated 10 months ago
InfiMM / Awesome-Multimodal-LLM-for-Math-STEM
Paper collections of multi-modal LLM for Math/STEM/Code.
☆130Updated 2 weeks ago
opendatalab / HA-DPO
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
☆98Updated last year
HJYao00 / DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
☆180Updated last year
X2FD / LVIS-INSTRUCT4V
☆133Updated last year
FreedomIntelligence / MLLM-Bench
MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria
☆72Updated last year