✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
☆152Oct 21, 2025Updated 4 months ago
Alternatives and similar repositories for MME-RealWorld
Users that are interested in MME-RealWorld are comparing it to the libraries listed below
Sorting:
- The Next Step Forward in Multimodal LLM Alignment☆197May 1, 2025Updated 10 months ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆164Dec 26, 2024Updated last year
- ☆124Jul 29, 2024Updated last year
- The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025☆276May 26, 2025Updated 9 months ago
- [ICML 2024] | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI☆116Jul 18, 2024Updated last year
- A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo☆34Aug 12, 2024Updated last year
- [ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant☆246Aug 14, 2024Updated last year
- [ICCV'25] When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning☆47Feb 16, 2026Updated 2 weeks ago
- ✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis☆731Dec 8, 2025Updated 2 months ago
- A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.☆74Oct 14, 2024Updated last year
- [WIP@Oct 13] 质衡-基准测试 (Q-Bench in Chinese),包含中文版【底层视觉问答】和【底层视觉描述】数据集,以及中文提示下的图片质量评价。 We will release Q-Bench in more languages in the futu…☆24Jan 7, 2024Updated 2 years ago
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆123Nov 25, 2024Updated last year
- Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks☆3,845Updated this week
- Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"☆287May 22, 2025Updated 9 months ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆94Sep 14, 2024Updated last year
- LMM for VQA, tcsvt version☆11Jul 19, 2024Updated last year
- Official repo for "GMS-3DQA: Projection-based Grid Mini-patch Sampling for 3D Model Quality Assessment"☆14Mar 10, 2024Updated last year
- Collections of papers and code for employing MLLM for quality assessment tasks.☆13Apr 18, 2024Updated last year
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆213Jan 6, 2025Updated last year
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]☆79Jul 1, 2025Updated 8 months ago
- ✨✨ [ICLR 2026] R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning☆281May 9, 2025Updated 9 months ago
- ☆46Dec 30, 2024Updated last year
- ✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆43Apr 10, 2025Updated 10 months ago
- ☆37May 28, 2025Updated 9 months ago
- Official repository of MMDU dataset☆104Sep 29, 2024Updated last year
- ☆37Sep 16, 2024Updated last year
- A Scene Graph-Enhanced Remote Sensing Large Vision-Language Model☆138Jan 19, 2026Updated last month
- MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingua…☆63May 15, 2025Updated 9 months ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆159Dec 6, 2024Updated last year
- [TACL] Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆16Nov 22, 2024Updated last year
- ☆65Feb 2, 2026Updated last month
- ☆360Jan 27, 2024Updated 2 years ago
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆67Feb 19, 2025Updated last year
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)☆322Jan 20, 2025Updated last year
- [COLM'25] Official implementation of the Law of Vision Representation in MLLMs☆175Oct 6, 2025Updated 4 months ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆139Jun 12, 2024Updated last year
- [NeurIPS 2024] The official implementation of "Image Copy Detection for Diffusion Models"☆18Oct 1, 2024Updated last year
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆113Jul 27, 2024Updated last year
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆161Sep 27, 2025Updated 5 months ago