☆163Feb 12, 2025Updated last year
Alternatives and similar repositories for Awesome-MLLM-Benchmarks
Users that are interested in Awesome-MLLM-Benchmarks are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR'2022 Oral] The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation☆32Oct 19, 2023Updated 2 years ago
- [NeurIPS 2023] Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models☆23Oct 21, 2025Updated 7 months ago
- [ICCV 2023] Compositional Feature Augmentation for Unbiased Scene Graph Generation☆16Dec 5, 2023Updated 2 years ago
- Official repository of MMDU dataset☆105Sep 29, 2024Updated last year
- Official code for the ICLR2023 paper Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection☆43Jun 4, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆12Nov 13, 2024Updated last year
- ☆20Nov 28, 2024Updated last year
- Spatial Aptitude Training for Multimodal Langauge Models☆33Feb 8, 2026Updated 4 months ago
- [AAAI 2026] Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension☆20Mar 6, 2026Updated 3 months ago
- ☆24Apr 16, 2022Updated 4 years ago
- [ICLR 2025] Official code for Combining Text-based and Drag-based Editing for Precise and Flexible Image Editing.☆21May 6, 2025Updated last year
- optimizing class activation maps by causal inference for weakly-supervised object localization task☆11May 5, 2022Updated 4 years ago
- Can we make visual tracking systems align more closely with human visual perception?☆35Apr 26, 2026Updated last month
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…☆63Nov 7, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆98Sep 14, 2024Updated last year
- ☆25Dec 23, 2024Updated last year
- ☆39Jan 9, 2026Updated 5 months ago
- Code for the CVPR 2020 oral paper: Weakly Supervised Visual Semantic Parsing☆32Dec 8, 2022Updated 3 years ago
- A Survey on Benchmarks of Multimodal Large Language Models☆151May 27, 2026Updated 2 weeks ago
- Human-like Controllable Image Captioning with Verb-specific Semantic Roles.☆36Mar 11, 2022Updated 4 years ago
- A fork to add multimodal model training to open-r1☆1,559Feb 8, 2025Updated last year
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆63Aug 23, 2024Updated last year
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆60Jul 26, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆11May 24, 2024Updated 2 years ago
- [CVPR'24] Neural Clustering based Visual Representation Learning☆44Oct 6, 2025Updated 8 months ago
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆85Jan 27, 2025Updated last year
- ☆16Oct 21, 2024Updated last year
- Active Learning for Point Cloud Semantic Segmentation via Spatial-Structural Diversity Reasoning☆25Aug 28, 2024Updated last year
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey☆477Jan 17, 2025Updated last year
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆56Mar 9, 2025Updated last year
- [ACL 2026] OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces☆125May 12, 2026Updated last month
- [ECCV’24] Official repository for "BEAF: Observing Before-AFter Changes to Evaluate Hallucination in Vision-language Models"☆22Mar 26, 2025Updated last year
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- 📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).☆1,022Sep 27, 2025Updated 8 months ago
- [ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration R…☆112Jul 9, 2025Updated 11 months ago
- ☆18Apr 23, 2025Updated last year
- [NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attention☆66Aug 30, 2025Updated 9 months ago
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)☆89Updated this week
- Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks☆4,196Jun 5, 2026Updated last week
- TrackGPT: Track What You Need in Videos via Text Prompts☆25May 16, 2023Updated 3 years ago