ErikZ719 / MCA-LLaVALinks

[ACM MM25] MCA-LLaVA: Manhattan Causal Attention for Reducing Hallucination in Large Vision-Language Models

☆10

Alternatives and similar repositories for MCA-LLaVA

Users that are interested in MCA-LLaVA are comparing it to the libraries listed below

Sorting:

yu-rp / VisualPerceptionToken
☆89Updated 3 months ago
Lackel / AGLA
[CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
☆37Updated last year
RupertLuo / VoCoT
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
☆69Updated last year
umd-huang-lab / SIMA
☆9Updated 2 months ago
MME-Benchmarks / MME-CoT
MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency
☆117Updated 3 weeks ago
yaolinli / DeCo
Code for DeCo: Decoupling token compression from semanchc abstraction in multimodal large language models
☆40Updated this week
Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆59Updated 5 months ago
zhishuifeiqian / VCR-Bench
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning
☆32Updated 2 weeks ago
mrwu-mac / ControlMLLM
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
☆184Updated this week
Hui-design / Open-LLaVA-Video-R1
[LLaVA-Video-R1]✨First Adaptation of R1 to LLaVA-Video (2025-03-18)
☆29Updated 2 months ago
yaolinli / TimeChat-Online
[ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
☆60Updated this week
ywh187 / FitPrune
☆53Updated 2 months ago
Cooperx521 / PyramidDrop
(CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
☆115Updated 4 months ago
AFeng-x / Draw-and-Understand
[ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
☆82Updated last month
PhoenixZ810 / RISEBench
Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
☆71Updated this week
zjunlp / Deco
[ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
☆89Updated 7 months ago
ustc-hyin / ClearSight
Code for paper: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models
☆24Updated 7 months ago
Stevetich / EventHallusion
EventHallusion: Diagnosing Event Hallucinations in Video LLMs
☆31Updated 3 months ago
Yaxin9Luo / Gamma-MOD
[ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models
☆37Updated 5 months ago
eric-ai-lab / GRIT
Official code for paper "GRIT: Teaching MLLMs to Think with Images"
☆109Updated this week
yu-rp / apiprompting
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
☆96Updated 9 months ago
xinyan-cxy / MINT-CoT
☆57Updated last month
appletea233 / Temporal-R1
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency
☆44Updated last month
OpenGVLab / Mono-InternVL
[CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
☆48Updated 3 months ago
gyhdog99 / RACRO2
Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)
☆15Updated 2 weeks ago
mengchuang123 / VASparse-github
[CVPR 2025] VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification
☆34Updated 3 months ago
GuangyanS / Sys2-LLaVA
☆25Updated 5 months ago
minglllli / CLS-RL
Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning
☆51Updated last month
xuyang-liu16 / GlobalCom2
🚀 Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models
☆30Updated last month
LaVi-Lab / AIM
[ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"
☆32Updated 3 weeks ago