ustc-hyin / HiMAPLinks

Code for paper: Unraveling the Shift of Visual Information Flow in MLLMs: From Phased Interaction to Efficient Inference

☆13

Alternatives and similar repositories for HiMAP

Users that are interested in HiMAP are comparing it to the libraries listed below

Sorting:

SUSTechBruce / LOOK-M
[EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…
☆104Updated last year
Osilly / dynamic_llava
[ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Cont…
☆62Updated 2 months ago
luka-group / mDPO
[EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.
☆83Updated last year
lzhxmu / VTW
Code release for VTW (AAAI 2025 Oral)
☆61Updated 2 weeks ago
MileBench / MileBench
This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"
☆35Updated last year
NUS-TRAIL / NoisyRollout
[NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
☆97Updated 2 months ago
42Shawn / LLaVA-PruMerge
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
☆154Updated last month
ThinkMorph / ThinkMorph
The official repository for the paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"
☆96Updated this week
ywh187 / FitPrune
☆60Updated 6 months ago
xtong-zhang / Chain-of-Focus
☆51Updated last week
shiqichen17 / VLM_Merging
Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)
☆81Updated 2 months ago
AIoT-MLSys-Lab / MEDA
[NAACL 2025🔥] MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
☆14Updated 5 months ago
whongzhong / MMHalSnowball
Official resource for paper Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models (ACL 20…
☆12Updated last year
TideDra / VL-RLHF
A RLHF Infrastructure for Vision-Language Models
☆187Updated last year
njucckevin / MM-Self-Improve
A Self-Training Framework for Vision-Language Reasoning
☆86Updated 10 months ago
LALBJ / PAI
[ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs
☆151Updated last year
LightChen233 / M3CoT
☆84Updated last year
TIGER-AI-Lab / VL-Rethinker
The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]
☆166Updated 5 months ago
JinXins / Awesome-Token-Merge-for-MLLMs
A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.
☆75Updated 3 weeks ago
DAMO-NLP-SG / CMM
✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
☆50Updated 4 months ago
OpenGVLab / V2PE
[ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
☆57Updated 11 months ago
ShadeCloak / ADORA
☆46Updated 7 months ago
EffiVLM-Bench / EffiVLM-Bench
☆28Updated 5 months ago
HZQ950419 / Math-LLaVA
Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
☆92Updated last year
zjunlp / Deco
[ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
☆120Updated 2 months ago
hkust-nlp / mstar
[ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
☆69Updated 4 months ago
bigai-nlco / LatentSeek
Official Repository of LatentSeek
☆68Updated 5 months ago
chengzu-li / MVoT
Imagine While Reasoning in Space: Multimodal Visualization-of-Thought (ICML 2025)
☆59Updated 7 months ago
yuezih / less-is-more
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)
☆55Updated last year
xiaomi-research / colar
[NeurIPS 2025] Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
☆59Updated 3 months ago