☆162Feb 12, 2025Updated last year
Alternatives and similar repositories for Awesome-MLLM-Benchmarks
Users that are interested in Awesome-MLLM-Benchmarks are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR'2022 Oral] The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation☆32Oct 19, 2023Updated 2 years ago
- [NeurIPS 2023] Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models☆24Oct 21, 2025Updated 8 months ago
- [ICCV 2023] Compositional Feature Augmentation for Unbiased Scene Graph Generation☆16Dec 5, 2023Updated 2 years ago
- Official repository of MMDU dataset☆107Sep 29, 2024Updated last year
- Official code for the ICLR2023 paper Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection☆43Jun 4, 2024Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆20Nov 28, 2024Updated last year
- ☆12Nov 13, 2024Updated last year
- Spatial Aptitude Training for Multimodal Langauge Models☆33Feb 8, 2026Updated 4 months ago
- [AAAI 2026] Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension☆21Mar 6, 2026Updated 3 months ago
- ☆24Apr 16, 2022Updated 4 years ago
- [ICLR 2025] Official code for Combining Text-based and Drag-based Editing for Precise and Flexible Image Editing.☆21May 6, 2025Updated last year
- optimizing class activation maps by causal inference for weakly-supervised object localization task☆11May 5, 2022Updated 4 years ago
- Can we make visual tracking systems align more closely with human visual perception?☆39Jun 22, 2026Updated last week
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…☆64Nov 7, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆98Sep 14, 2024Updated last year
- ☆25Dec 23, 2024Updated last year
- ☆40Jan 9, 2026Updated 5 months ago
- Code for the CVPR 2020 oral paper: Weakly Supervised Visual Semantic Parsing☆32Dec 8, 2022Updated 3 years ago
- A Survey on Benchmarks of Multimodal Large Language Models☆155Jun 12, 2026Updated 3 weeks ago
- Human-like Controllable Image Captioning with Verb-specific Semantic Roles.☆36Mar 11, 2022Updated 4 years ago
- A fork to add multimodal model training to open-r1☆1,576Feb 8, 2025Updated last year
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆63Aug 23, 2024Updated last year
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆60Jul 26, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆11May 24, 2024Updated 2 years ago
- [CVPR'24] Neural Clustering based Visual Representation Learning☆44Oct 6, 2025Updated 8 months ago
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆85Jan 27, 2025Updated last year
- ☆17Oct 21, 2024Updated last year
- Active Learning for Point Cloud Semantic Segmentation via Spatial-Structural Diversity Reasoning☆26Aug 28, 2024Updated last year
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆57Mar 9, 2025Updated last year
- [ACL 2026] OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces☆125May 12, 2026Updated last month
- [ECCV’24] Official repository for "BEAF: Observing Before-AFter Changes to Evaluate Hallucination in Vision-language Models"☆22Mar 26, 2025Updated last year
- 📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).☆1,026Sep 27, 2025Updated 9 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration R…☆113Jul 9, 2025Updated 11 months ago
- ☆18Apr 23, 2025Updated last year
- Official Code and data for ACL 2024 finding, "An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models"☆25Nov 10, 2024Updated last year
- [NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attention☆66Aug 30, 2025Updated 10 months ago
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)☆89Jun 9, 2026Updated 3 weeks ago
- Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks☆4,253Updated this week
- TrackGPT: Track What You Need in Videos via Text Prompts☆25May 16, 2023Updated 3 years ago