☆160Feb 12, 2025Updated last year
Alternatives and similar repositories for Awesome-MLLM-Benchmarks
Users that are interested in Awesome-MLLM-Benchmarks are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR'2022 Oral] The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation☆32Oct 19, 2023Updated 2 years ago
- [NeurIPS 2023] Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models☆23Oct 21, 2025Updated 7 months ago
- [ICCV 2023] Compositional Feature Augmentation for Unbiased Scene Graph Generation☆16Dec 5, 2023Updated 2 years ago
- Official repository of MMDU dataset☆105Sep 29, 2024Updated last year
- Official code for the ICLR2023 paper Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection☆43Jun 4, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆20Nov 28, 2024Updated last year
- ☆12Nov 13, 2024Updated last year
- Spatial Aptitude Training for Multimodal Langauge Models☆31Feb 8, 2026Updated 3 months ago
- [AAAI 2026] Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension☆20Mar 6, 2026Updated 2 months ago
- ☆24Apr 16, 2022Updated 4 years ago
- Can we make visual tracking systems align more closely with human visual perception?☆33Apr 26, 2026Updated 3 weeks ago
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…☆63Nov 7, 2024Updated last year
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆98Sep 14, 2024Updated last year
- ☆25Dec 23, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆38Jan 9, 2026Updated 4 months ago
- Code for the CVPR 2020 oral paper: Weakly Supervised Visual Semantic Parsing☆33Dec 8, 2022Updated 3 years ago
- Human-like Controllable Image Captioning with Verb-specific Semantic Roles.☆36Mar 11, 2022Updated 4 years ago
- A fork to add multimodal model training to open-r1☆1,548Feb 8, 2025Updated last year
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆63Aug 23, 2024Updated last year
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆60Jul 26, 2024Updated last year
- ☆11May 24, 2024Updated last year
- [CVPR'24] Neural Clustering based Visual Representation Learning☆44Oct 6, 2025Updated 7 months ago
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆85Jan 27, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆16Oct 21, 2024Updated last year
- Active Learning for Point Cloud Semantic Segmentation via Spatial-Structural Diversity Reasoning☆25Aug 28, 2024Updated last year
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey☆477Jan 17, 2025Updated last year
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆57Mar 9, 2025Updated last year
- A Survey on Benchmarks of Multimodal Large Language Models☆149Jul 1, 2025Updated 10 months ago
- OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems☆124May 12, 2026Updated last week
- [ECCV’24] Official repository for "BEAF: Observing Before-AFter Changes to Evaluate Hallucination in Vision-language Models"☆22Mar 26, 2025Updated last year
- 📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).☆1,018Sep 27, 2025Updated 7 months ago
- [ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration R…☆112Jul 9, 2025Updated 10 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆18Apr 23, 2025Updated last year
- Official Code and data for ACL 2024 finding, "An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models"☆25Nov 10, 2024Updated last year
- [NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attention☆66Aug 30, 2025Updated 8 months ago
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)☆89Sep 23, 2025Updated 7 months ago
- Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks☆4,145May 15, 2026Updated last week
- TrackGPT: Track What You Need in Videos via Text Prompts☆25May 16, 2023Updated 3 years ago
- ✨First Open-Source R1-like Video-LLM [2025/02/18]☆383Feb 23, 2025Updated last year