☆159Feb 12, 2025Updated last year
Alternatives and similar repositories for Awesome-MLLM-Benchmarks
Users that are interested in Awesome-MLLM-Benchmarks are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR'2022 Oral] The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation☆32Oct 19, 2023Updated 2 years ago
- [NeurIPS 2023] Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models☆22Oct 21, 2025Updated 5 months ago
- [ICCV 2023] Compositional Feature Augmentation for Unbiased Scene Graph Generation☆15Dec 5, 2023Updated 2 years ago
- Official repository of MMDU dataset☆105Sep 29, 2024Updated last year
- Official code for the ICLR2023 paper Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection☆43Jun 4, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Spatial Aptitude Training for Multimodal Langauge Models☆26Feb 8, 2026Updated last month
- ☆12Nov 13, 2024Updated last year
- ☆20Nov 28, 2024Updated last year
- [AAAI 2026] Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension☆18Mar 6, 2026Updated 3 weeks ago
- ☆25Apr 16, 2022Updated 3 years ago
- [ICLR 2025] Official code for Combining Text-based and Drag-based Editing for Precise and Flexible Image Editing.☆20May 6, 2025Updated 10 months ago
- optimizing class activation maps by causal inference for weakly-supervised object localization task☆11May 5, 2022Updated 3 years ago
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…☆62Nov 7, 2024Updated last year
- Can we make visual tracking systems align more closely with human visual perception?☆29Mar 18, 2026Updated last week
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆96Sep 14, 2024Updated last year
- ☆25Dec 23, 2024Updated last year
- ☆35Jan 9, 2026Updated 2 months ago
- Code for the CVPR 2020 oral paper: Weakly Supervised Visual Semantic Parsing☆33Dec 8, 2022Updated 3 years ago
- Human-like Controllable Image Captioning with Verb-specific Semantic Roles.☆36Mar 11, 2022Updated 4 years ago
- A fork to add multimodal model training to open-r1☆1,514Feb 8, 2025Updated last year
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆62Aug 23, 2024Updated last year
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆62Jul 26, 2024Updated last year
- [CVPR'24] Neural Clustering based Visual Representation Learning☆44Oct 6, 2025Updated 5 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆11May 24, 2024Updated last year
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆86Jan 27, 2025Updated last year
- ☆16Oct 21, 2024Updated last year
- Active Learning for Point Cloud Semantic Segmentation via Spatial-Structural Diversity Reasoning☆25Aug 28, 2024Updated last year
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey☆477Jan 17, 2025Updated last year
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆55Mar 9, 2025Updated last year
- A Survey on Benchmarks of Multimodal Large Language Models☆150Jul 1, 2025Updated 8 months ago
- 📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).☆996Sep 27, 2025Updated 6 months ago
- [ECCV’24] Official repository for "BEAF: Observing Before-AFter Changes to Evaluate Hallucination in Vision-language Models"☆22Mar 26, 2025Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration R…☆111Jul 9, 2025Updated 8 months ago
- ☆17Apr 23, 2025Updated 11 months ago
- Official Code and data for ACL 2024 finding, "An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models"☆25Nov 10, 2024Updated last year
- [NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attention☆66Aug 30, 2025Updated 6 months ago
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)☆89Sep 23, 2025Updated 6 months ago
- Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks☆3,958Updated this week
- TrackGPT: Track What You Need in Videos via Text Prompts☆25May 16, 2023Updated 2 years ago