Code for DeCo: Decoupling token compression from semanchc abstraction in multimodal large language models
☆77Jul 14, 2025Updated 7 months ago
Alternatives and similar repositories for DeCo
Users that are interested in DeCo are comparing it to the libraries listed below
Sorting:
- A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo☆34Aug 12, 2024Updated last year
- [COLM'25] Official implementation of the Law of Vision Representation in MLLMs☆176Oct 6, 2025Updated 5 months ago
- ☆13Aug 7, 2025Updated 7 months ago
- ☆11Oct 2, 2024Updated last year
- ☆12May 19, 2024Updated last year
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆113Jul 27, 2024Updated last year
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆34Nov 13, 2024Updated last year
- [ICLR '25] Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"☆97Nov 30, 2025Updated 3 months ago
- HallE-Control: Controlling Object Hallucination in LMMs☆31Apr 10, 2024Updated last year
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆129Apr 4, 2025Updated 11 months ago
- [TACL] Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆16Nov 22, 2024Updated last year
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆33Oct 12, 2024Updated last year
- Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning☆29Sep 12, 2025Updated 5 months ago
- ☆15May 23, 2022Updated 3 years ago
- [ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"☆110Dec 4, 2024Updated last year
- If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions☆17Apr 4, 2024Updated last year
- Awesome MLLMs/Benchmarks for Short/Long/Streaming Video Understanding☆62Sep 1, 2025Updated 6 months ago
- [ECCV 2024] FlexAttention for Efficient High-Resolution Vision-Language Models☆46Jan 8, 2025Updated last year
- LEO: A powerful Hybrid Multimodal LLM☆19Jan 18, 2025Updated last year
- ☆22Mar 7, 2025Updated last year
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆430Dec 22, 2024Updated last year
- 【NeurIPS 2024】Dense Connector for MLLMs☆181Oct 14, 2024Updated last year
- ☆27Jul 23, 2025Updated 7 months ago
- ☆22Jan 14, 2026Updated last month
- [AAAI 2025] Enhance Vision-Language Alignment with Noise☆25Dec 19, 2024Updated last year
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆54Mar 9, 2025Updated last year
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆204Jul 17, 2025Updated 7 months ago
- ☆23Aug 17, 2024Updated last year
- ☆26Feb 20, 2025Updated last year
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆349Apr 20, 2025Updated 10 months ago
- ☆359Jan 27, 2024Updated 2 years ago
- [CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention☆61Jul 16, 2024Updated last year
- [ICLR 2026] 🐻 Uniform Discrete Diffusion with Metric Path for Video Generation☆106Feb 6, 2026Updated last month
- ☆50Oct 29, 2023Updated 2 years ago
- [ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning☆296Mar 13, 2024Updated last year
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆108Aug 21, 2025Updated 6 months ago
- Preference Learning for LLaVA☆59Nov 9, 2024Updated last year
- Repo for the EMNLP 2023 paper "A Simple Knowledge-Based Visual Question Answering"☆25Dec 14, 2023Updated 2 years ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆31Dec 23, 2024Updated last year