HKUST-LongGroup / Awesome-MLLM-BenchmarksView external linksLinks
☆158Feb 12, 2025Updated last year
Alternatives and similar repositories for Awesome-MLLM-Benchmarks
Users that are interested in Awesome-MLLM-Benchmarks are comparing it to the libraries listed below
Sorting:
- [CVPR'2022 Oral] The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation☆32Oct 19, 2023Updated 2 years ago
- [NeurIPS'2023] Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models☆22Oct 21, 2025Updated 3 months ago
- Official repository of MMDU dataset☆103Sep 29, 2024Updated last year
- ☆12Nov 13, 2024Updated last year
- ☆20Nov 28, 2024Updated last year
- [ICCV'2023] Compositional Feature Augmentation for Unbiased Scene Graph Generation☆15Dec 5, 2023Updated 2 years ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆94Sep 14, 2024Updated last year
- Can we make visual tracking systems align more closely with human visual perception?☆17Updated this week
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆60Aug 23, 2024Updated last year
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…☆62Nov 7, 2024Updated last year
- [Up-To-Date] Awesome Agent Memory Paper Resource☆50Updated this week
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆61Jul 26, 2024Updated last year
- Spatial Aptitude Training for Multimodal Langauge Models☆24Feb 8, 2026Updated last week
- ☆18Apr 20, 2025Updated 9 months ago
- Official code for the ICLR2023 paper Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection☆43Jun 4, 2024Updated last year
- A Survey on Benchmarks of Multimodal Large Language Models☆148Jul 1, 2025Updated 7 months ago
- ☆25Apr 16, 2022Updated 3 years ago
- [ICLR2025] Official code for Combining Text-based and Drag-based Editing for Precise and Flexible Image Editing.☆20May 6, 2025Updated 9 months ago
- ☆16Oct 21, 2024Updated last year
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆54Mar 9, 2025Updated 11 months ago
- [NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attention☆66Aug 30, 2025Updated 5 months ago
- [ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration R…☆111Jul 9, 2025Updated 7 months ago
- [NeurIPS 2024] Matryoshka Query Transformer for Large Vision-Language Models☆123Jul 1, 2024Updated last year
- An evaluation suite for Retrieval-Augmented Generation (RAG).☆23Apr 26, 2025Updated 9 months ago
- 📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).☆979Sep 27, 2025Updated 4 months ago
- A fork to add multimodal model training to open-r1☆1,474Feb 8, 2025Updated last year
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆113Jul 27, 2024Updated last year
- Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks☆3,816Updated this week
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆86Jan 27, 2025Updated last year
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey☆477Jan 17, 2025Updated last year
- Official Code and data for ACL 2024 finding, "An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models"☆25Nov 10, 2024Updated last year
- [ECCV’24] Official repository for "BEAF: Observing Before-AFter Changes to Evaluate Hallucination in Vision-language Models"☆21Mar 26, 2025Updated 10 months ago
- Code for the CVPR 2020 oral paper: Weakly Supervised Visual Semantic Parsing☆33Dec 8, 2022Updated 3 years ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆85Oct 26, 2025Updated 3 months ago
- Human-like Controllable Image Captioning with Verb-specific Semantic Roles.☆36Mar 11, 2022Updated 3 years ago
- ☆11May 24, 2024Updated last year
- ✨First Open-Source R1-like Video-LLM [2025/02/18]☆381Feb 23, 2025Updated 11 months ago
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)☆88Sep 23, 2025Updated 4 months ago
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆24Sep 9, 2024Updated last year