XIAO4579 / Vlm-interpretabilityLinks

Official implementation for the paper"Towards Understanding How Knowledge Evolves in Large Vision-Language Models"

☆18

Alternatives and similar repositories for Vlm-interpretability

Users that are interested in Vlm-interpretability are comparing it to the libraries listed below

Sorting:

seilk / VisAttnSink
[ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Models
☆48Updated 7 months ago
wuw2019 / LoTLIP
[NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
☆45Updated 8 months ago
thunlp / DeepPerception
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
☆65Updated 3 months ago
zifuwan / ONLY
[ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models
☆36Updated 2 months ago
Haochen-Wang409 / TreeVGR
Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"
☆63Updated 2 months ago
leaves162 / CLIPtrase
cliptrase
☆46Updated last year
Shengcao-Cao / groundLMM
Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision
☆40Updated 5 months ago
xmed-lab / TAM
[ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs
☆78Updated last month
Lackel / AGLA
[CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
☆45Updated last year
eric-ai-lab / GRIT
Official code for paper "GRIT: Teaching MLLMs to Think with Images"
☆128Updated last month
rui-qian / READ
Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)
☆42Updated 3 weeks ago
yu-rp / apiprompting
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
☆102Updated 11 months ago
zhengyuan-xie / ECCV24_NeST
[ECCV 2024] Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation
☆32Updated 6 months ago
mrwu-mac / ControlMLLM
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
☆192Updated 2 months ago
mbzuai-oryx / VideoGLaMM
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
☆82Updated 5 months ago
JiuTian-VL / MoME
[NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
☆72Updated 4 months ago
wusize / F-LMM
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
☆103Updated 3 months ago
xing0047 / cca-llava
[NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attention
☆61Updated 3 weeks ago
lloongx / DIKI
[ECCV 2024] Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models
☆53Updated last year
Theia-4869 / VisPruner
[ICCV 2025] Official code for paper: Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
☆30Updated 2 months ago
DreamMr / RAP
Code for Retrieval-Augmented Perception （ICML 2025)
☆53Updated last month
om-ai-lab / ZoomEye
[EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
☆53Updated 3 weeks ago
zhangquanchen / VisRL
[ICCV 2025] VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
☆39Updated 3 months ago
see-say-segment / sesame
🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"
☆43Updated last year
Zi-hao-Wei / Efficient-Vision-Language-Pre-training-by-Cluster-Masking
[CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.
☆29Updated last year
yuhui-zh15 / VLMClassifier
Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)
☆91Updated 11 months ago
maifoundations / Visionary-R1
Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning
☆37Updated 2 months ago
congvvc / InstructSeg
[ICCV 2025] Official implementation of "InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models"
☆47Updated 7 months ago
AFeng-x / Draw-and-Understand
[ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
☆88Updated 3 months ago
yuecao0119 / MMFuser
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …
☆58Updated 10 months ago