☆160Feb 12, 2025Updated last year
Alternatives and similar repositories for Awesome-MLLM-Benchmarks
Users that are interested in Awesome-MLLM-Benchmarks are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR'2022 Oral] The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation☆33Oct 19, 2023Updated 2 years ago
- [NeurIPS 2023] Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models☆22Oct 21, 2025Updated 5 months ago
- [ICCV 2023] Compositional Feature Augmentation for Unbiased Scene Graph Generation☆15Dec 5, 2023Updated 2 years ago
- Official repository of MMDU dataset☆107Sep 29, 2024Updated last year
- Official code for the ICLR2023 paper Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection☆43Jun 4, 2024Updated last year
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆20Nov 28, 2024Updated last year
- ☆12Nov 13, 2024Updated last year
- Spatial Aptitude Training for Multimodal Langauge Models☆29Feb 8, 2026Updated 2 months ago
- [AAAI 2026] Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension☆18Mar 6, 2026Updated last month
- ☆25Apr 16, 2022Updated 4 years ago
- [ICLR 2025] Official code for Combining Text-based and Drag-based Editing for Precise and Flexible Image Editing.☆20May 6, 2025Updated 11 months ago
- optimizing class activation maps by causal inference for weakly-supervised object localization task☆11May 5, 2022Updated 3 years ago
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…☆62Nov 7, 2024Updated last year
- Can we make visual tracking systems align more closely with human visual perception?☆33Mar 18, 2026Updated last month
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆97Sep 14, 2024Updated last year
- ☆25Dec 23, 2024Updated last year
- ☆36Jan 9, 2026Updated 3 months ago
- Code for the CVPR 2020 oral paper: Weakly Supervised Visual Semantic Parsing☆33Dec 8, 2022Updated 3 years ago
- Human-like Controllable Image Captioning with Verb-specific Semantic Roles.☆36Mar 11, 2022Updated 4 years ago
- A fork to add multimodal model training to open-r1☆1,528Feb 8, 2025Updated last year
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆63Aug 23, 2024Updated last year
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆62Jul 26, 2024Updated last year
- [CVPR'24] Neural Clustering based Visual Representation Learning☆44Oct 6, 2025Updated 6 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆11May 24, 2024Updated last year
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆86Jan 27, 2025Updated last year
- ☆16Oct 21, 2024Updated last year
- Active Learning for Point Cloud Semantic Segmentation via Spatial-Structural Diversity Reasoning☆25Aug 28, 2024Updated last year
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey☆478Jan 17, 2025Updated last year
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆55Mar 9, 2025Updated last year
- A Survey on Benchmarks of Multimodal Large Language Models☆150Jul 1, 2025Updated 9 months ago
- OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems☆122Jul 13, 2025Updated 9 months ago
- [ECCV’24] Official repository for "BEAF: Observing Before-AFter Changes to Evaluate Hallucination in Vision-language Models"☆22Mar 26, 2025Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- 📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).☆1,003Sep 27, 2025Updated 6 months ago
- [ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration R…☆112Jul 9, 2025Updated 9 months ago
- ☆17Apr 23, 2025Updated 11 months ago
- Official Code and data for ACL 2024 finding, "An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models"☆25Nov 10, 2024Updated last year
- [NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attention☆66Aug 30, 2025Updated 7 months ago
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)☆90Sep 23, 2025Updated 6 months ago
- Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks☆4,047Apr 10, 2026Updated last week