yfzhang114 / MME-RealWorld

✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

☆97

Alternatives and similar repositories for MME-RealWorld:

Users that are interested in MME-RealWorld are comparing it to the libraries listed below

Liuziyu77 / MMDU
Official repository of MMDU dataset
☆86Updated 5 months ago
BAAI-DCAI / DataOptim
A collection of visual instruction tuning datasets.
☆76Updated last year
Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆54Updated last month
foundation-multimodal-models / CAL
[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
☆57Updated 5 months ago
Cooperx521 / PyramidDrop
(CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
☆75Updated 2 weeks ago
MMStar-Benchmark / MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
☆168Updated 5 months ago
RupertLuo / VoCoT
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
☆48Updated 8 months ago
palchenli / VL-Instruction-Tuning
☆91Updated last year
RifleZhang / LLaVA-Reasoner-DPO
☆66Updated 2 months ago
Liuziyu77 / RAR
The official implementation of RAR
☆82Updated 11 months ago
HJYao00 / DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
☆156Updated 5 months ago
OpenGVLab / LCL
[NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
☆68Updated last month
OpenGVLab / V2PE
[ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
☆31Updated 3 months ago
shufangxun / LLaVA-MoD
[ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
☆113Updated 2 months ago
KangarooGroup / Kangaroo
official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input
☆63Updated 6 months ago
OpenGVLab / MM-NIAH
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…
☆113Updated 3 months ago
OpenGVLab / MMT-Bench
ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
☆104Updated 8 months ago
hshjerry / VideoEspresso
[CVPR'25] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆61Updated 3 weeks ago
X2FD / LVIS-INSTRUCT4V
☆133Updated last year
alibaba / conv-llava
☆111Updated 7 months ago
thunlp / Muffin
☆61Updated last year
BAAI-DCAI / MMVU
☆41Updated this week
yfzhang114 / LLaVA-Align
This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strat…
☆76Updated last month
joez17 / VideoNIAH
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆45Updated last week
baaivision / DenseFusion
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆136Updated 3 months ago