✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
☆52Jul 11, 2025Updated 7 months ago
Alternatives and similar repositories for CMM
Users that are interested in CMM are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆43Feb 27, 2025Updated last year
- Multi-Task instruction-tuned LLaMA☆14May 5, 2023Updated 2 years ago
- [CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for C…☆279Jan 16, 2025Updated last year
- [CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding☆379Oct 7, 2024Updated last year
- [NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attention☆66Aug 30, 2025Updated 6 months ago
- [ICCV 2025 Highlight] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆194Mar 17, 2025Updated 11 months ago
- [AAAI 2025] Open-vocabulary Video Instance Segmentation Codebase built upon Detectron2, which is really easy to use.☆25Dec 30, 2024Updated last year
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆31Dec 23, 2024Updated last year
- The implementation of VectorNet. Done and Lose☆41Jun 21, 2020Updated 5 years ago
- Data and Code for CVPR 2025 paper "MMVU: Measuring Expert-Level Multi-Discipline Video Understanding"☆77Feb 28, 2025Updated last year
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆45Apr 3, 2025Updated 11 months ago
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆34Nov 13, 2024Updated last year
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆28Jul 15, 2025Updated 7 months ago
- [ICLR 2026] Official PyTorch implementation for "ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding"☆57Dec 26, 2025Updated 2 months ago
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search" [EMNLP25]☆38Feb 1, 2026Updated last month
- [ACL2023 Area Chair Award] Official repo for the paper "Tell2Design: A Dataset for Language-Guided Floor Plan Generation".☆80Mar 14, 2025Updated 11 months ago
- [EMNLP 2023] Once Upon a *Time* in *Graph*: Relative-Time Pretraining for Complex Temporal Reasoning☆17Oct 31, 2023Updated 2 years ago
- VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs☆1,278Jan 23, 2025Updated last year
- [ICML 2025 Oral] An official implementation of VideoRoPE & VideoRoPE++☆219Feb 2, 2026Updated last month
- The code for PixelRefer & VideoRefer☆344Nov 16, 2025Updated 3 months ago
- Unifying Specialized Visual Encoders for Video Language Models☆25Nov 22, 2025Updated 3 months ago
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆92Jun 28, 2024Updated last year
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆24Sep 26, 2024Updated last year
- [CVPR 2023] Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning☆22Jun 11, 2023Updated 2 years ago
- [ICLR '25] Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"☆95Nov 30, 2025Updated 3 months ago
- [NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…☆47Aug 21, 2024Updated last year
- ☆43Oct 7, 2024Updated last year
- Contrastive Chain-of-Thought Prompting☆68Nov 18, 2023Updated 2 years ago
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆42Dec 16, 2025Updated 2 months ago
- [ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration R…☆111Jul 9, 2025Updated 7 months ago
- [ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shi…☆71Feb 9, 2026Updated 3 weeks ago
- 轻小说文库 epub 解析打包☆21May 3, 2020Updated 5 years ago
- ☆13Aug 7, 2025Updated 6 months ago
- Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows☆19Nov 4, 2025Updated 4 months ago
- The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025☆276May 26, 2025Updated 9 months ago
- This repo contains the code for the paper "Understanding and Mitigating Hallucinations in Large Vision-Language Models via Modular Attrib…☆33Jul 14, 2025Updated 7 months ago
- Large Language Models Can Self-Improve in Long-context Reasoning☆72Nov 24, 2024Updated last year
- MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources☆216Sep 26, 2025Updated 5 months ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆54Mar 9, 2025Updated 11 months ago