MME-Benchmarks/MME-RealWorld

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MME-Benchmarks/MME-RealWorld)

MME-Benchmarks / MME-RealWorld

✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

☆160

Alternatives and similar repositories for MME-RealWorld

Users that are interested in MME-RealWorld are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yfzhang114 / SliME
View on GitHub
✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
☆163Dec 26, 2024Updated last year
Kwai-YuanQi / MM-RLHF
View on GitHub
The Next Step Forward in Multimodal LLM Alignment
☆198May 1, 2025Updated last year
MME-Benchmarks / MME-Unify
View on GitHub
✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
☆42Apr 10, 2025Updated last year
alibaba / conv-llava
View on GitHub
☆128Jul 29, 2024Updated last year
CircleRadon / TokenPacker
View on GitHub
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025
☆279May 26, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
yfzhang114 / r1_reward
View on GitHub
✨✨ [ICLR 2026] R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
☆291May 9, 2025Updated last year
VisionXLab / LRS-VQA
View on GitHub
[ICCV'25] When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning
☆52Feb 16, 2026Updated 5 months ago
luogen1996 / LLaVA-HR
View on GitHub
[ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant
☆249Aug 14, 2024Updated last year
ParadoxZW / LLaVA-UHD-Better
View on GitHub
A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo
☆35Aug 12, 2024Updated last year
longrongyang / STGC
View on GitHub
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
☆13Feb 11, 2025Updated last year
MME-Benchmarks / Video-MME
View on GitHub
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
☆787Dec 8, 2025Updated 7 months ago
OpenGVLab / MMT-Bench
View on GitHub
[ICML 2024] | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
☆119Apr 6, 2026Updated 3 months ago
zzc-1998 / MLLM-QA-Papers-with-Code
View on GitHub
Collections of papers and code for employing MLLM for quality assessment tasks.
☆12Apr 18, 2024Updated 2 years ago
Q-Future / Chinese-Q-Bench
View on GitHub
[WIP@Oct 13] 质衡-基准测试 (Q-Bench in Chinese)，包含中文版【底层视觉问答】和【底层视觉描述】数据集，以及中文提示下的图片质量评价。 We will release Q-Bench in more languages in the futu…
☆24Jan 7, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
OpenGVLab / MM-NIAH
View on GitHub
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…
☆126Nov 25, 2024Updated last year
FrankYang-17 / RealUnify
View on GitHub
☆27Oct 10, 2025Updated 9 months ago
zzc-1998 / GMS-3DQA
View on GitHub
Official repo for "GMS-3DQA: Projection-based Grid Mini-patch Sampling for 3D Model Quality Assessment"
☆14Mar 10, 2024Updated 2 years ago
FrankYang-17 / MME-VideoOCR
View on GitHub
☆40May 28, 2025Updated last year
MMMU-Benchmark / MMMU
View on GitHub
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for E…
☆589Feb 12, 2026Updated 5 months ago
Luo-Z13 / SkySense-Chat
View on GitHub
A Scene Graph-Enhanced Remote Sensing Large Vision-Language Model
☆148Jan 19, 2026Updated 6 months ago
open-compass / VLMEvalKit
View on GitHub
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
☆4,295Updated this week
si0wang / VisVM
View on GitHub
☆46Dec 30, 2024Updated last year
xavier-yu114 / Zoom-Refine
View on GitHub
Zoom-Refine: Boosting High-Resolution Multimodal Understanding via Localized Zoom and Self-Refinement
☆19Jul 4, 2026Updated 2 weeks ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Tencent-QQMM / Video-CCAM
View on GitHub
A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.
☆74Oct 14, 2024Updated last year
yuweihao / MM-Vet
View on GitHub
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
☆329Jan 20, 2025Updated last year
multimodal-art-projection / OmniBench
View on GitHub
A project for tri-modal LLM benchmarking and instruction tuning.
☆61Mar 27, 2025Updated last year
lx709 / VRSBench
View on GitHub
☆69Jun 11, 2026Updated last month
VITA-MLLM / VITA
View on GitHub
✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
☆2,520Mar 28, 2025Updated last year
open-compass / MMBench
View on GitHub
Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"
☆306May 22, 2025Updated last year
Hon-Wong / ByteVideoLLM
View on GitHub
[ICCV 2025] Dynamic-VLM
☆28Dec 16, 2024Updated last year
baaivision / DenseFusion
View on GitHub
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆159Dec 6, 2024Updated last year
deepcs233 / Visual-CoT
View on GitHub
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …
☆447Dec 22, 2024Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
FreedomIntelligence / LongLLaVA
View on GitHub
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
☆211Jan 6, 2025Updated last year
Sueqk / LMM-VQA
View on GitHub
LMM for VQA, tcsvt version
☆10Jul 19, 2024Updated 2 years ago
hwanyu112 / VIBE-Benchmark
View on GitHub
☆27Feb 3, 2026Updated 5 months ago
OpenGVLab / MMIU
View on GitHub
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
☆98Sep 14, 2024Updated last year
Q-Future / CMC-Bench
View on GitHub
[ACMMM 2025] Benchmarking MLLM Codec Ability
☆33Jun 14, 2024Updated 2 years ago
longvideobench / LongVideoBench
View on GitHub
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
☆133Jul 27, 2024Updated last year
tsb0601 / MMVP
View on GitHub
☆364Jan 27, 2024Updated 2 years ago