bytedance/LVLM_Interpretation

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/bytedance/LVLM_Interpretation)

bytedance / LVLM_Interpretation

The official repo for "Where do Large Vision-Language Models Look at when Answering Questions?"

☆72

Alternatives and similar repositories for LVLM_Interpretation

Users that are interested in LVLM_Interpretation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ywh187 / FitPrune
View on GitHub
☆68Jan 23, 2026Updated 6 months ago
BillChan226 / HALC
View on GitHub
[ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"
☆115Dec 4, 2024Updated last year
xmed-lab / TAM
View on GitHub
[ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs
☆190Dec 14, 2025Updated 7 months ago
RuoyuChen10 / EAGLE
View on GitHub
[CVPR 2026] Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation
☆44Jun 18, 2026Updated last month
zhangce01 / DeGF
View on GitHub
[ICLR 2025] Code for Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models
☆26Apr 14, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
yuezih / less-is-more
View on GitHub
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)
☆58Oct 28, 2024Updated last year
zjunlp / Deco
View on GitHub
[ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
☆146Sep 11, 2025Updated 10 months ago
deep-spin / Infinite-Video
View on GitHub
\infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation
☆21Feb 14, 2025Updated last year
kaistAI / Volcano
View on GitHub
[NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…
☆49Aug 21, 2024Updated last year
TerryPei / CSP
View on GitHub
Cross-Self KV Cache Pruning for Efficient Vision-Language Inference
☆10Dec 15, 2024Updated last year
Ziwei-Zheng / VaLSe
View on GitHub
A library of visualization tools for the interpretability and hallucination analysis of large vision-language models (LVLMs).
☆42May 22, 2025Updated last year
sled-group / moh
View on GitHub
[NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models
☆37Nov 13, 2024Updated last year
LzVv123456 / VISTA
View on GitHub
☆86Jul 28, 2025Updated 11 months ago
tomiock / macrograd
View on GitHub
Deep learning Framework from scratch.
☆11Jul 23, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Lackel / AGLA
View on GitHub
[CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
☆68Jul 16, 2024Updated 2 years ago
liuting20 / MustDrop
View on GitHub
Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model
☆36Jan 8, 2025Updated last year
alibaba / conv-llava
View on GitHub
☆128Jul 29, 2024Updated last year
The-Martyr / CausalMM
View on GitHub
[ICLR 2025] Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
☆66Jul 5, 2025Updated last year
UKPLab / arxiv2025-inherent-limits-plms
View on GitHub
Code repository for the paper "The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Le…
☆14Jan 16, 2025Updated last year
SKYLENAGE-AI / DeepVision-103K
View on GitHub
Codebase for DeepVision-103K
☆22Feb 21, 2026Updated 5 months ago
markywg / transagent
View on GitHub
[NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration
☆25Oct 17, 2024Updated last year
Theia-4869 / FasterVLM
View on GitHub
Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.
☆114Jun 29, 2025Updated last year
TIGER-AI-Lab / Hierarchical-Reasoner
View on GitHub
Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning [ICLR26]
☆64Apr 11, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
EvolvingLMMs-Lab / sae
View on GitHub
A framework that allows you to apply Sparse AutoEncoder on any models
☆53Jul 11, 2025Updated last year
shengliu66 / FractionalReason
View on GitHub
Official github repo for "Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute"
☆17Jun 30, 2025Updated last year
yuhui-zh15 / AutoConverter
View on GitHub
Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…
☆40May 26, 2025Updated last year
itsqyh / Awesome-LMMs-Mechanistic-Interpretability
View on GitHub
A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository agg…
☆215Mar 4, 2026Updated 4 months ago
bebr2 / RACE
View on GitHub
Code for RACE.
☆15Nov 12, 2025Updated 8 months ago
jun297 / v1
View on GitHub
v1: Learning to Point Visual Tokens for Multimodal Grounded Reasoning
☆21Updated this week
compling-wat / vlm-lens
View on GitHub
[EMNLP 2025 Demo] Extracting internal representations from vision-language models. Beta version.
☆123Apr 25, 2026Updated 3 months ago
DripNowhy / Sherlock
View on GitHub
[NeurIPS 2025] Official Implementation of paper "Sherlock: Self-Correcting Reasoning in Vision-Language Models"
☆31Jun 4, 2026Updated last month
UMass-Embodied-AGI / FlexAttention
View on GitHub
[ECCV 2024] FlexAttention for Efficient High-Resolution Vision-Language Models
☆49Jan 8, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
shengliu66 / VTI
View on GitHub
Code for Reducing Hallucinations in Vision-Language Models via Latent Space Steering
☆117Nov 23, 2024Updated last year
jinghan1he / VHR
View on GitHub
[ACL 2025] Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence
☆21Jun 10, 2025Updated last year
clemneo / llava-interp
View on GitHub
☆86Nov 5, 2024Updated last year
haoyu-bu / CAFe
View on GitHub
Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"
☆33Mar 26, 2025Updated last year
saccharomycetes / mllms_know
View on GitHub
[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'
☆381Apr 20, 2025Updated last year
shiqichen17 / AdaptVis
View on GitHub
Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)
☆76May 2, 2025Updated last year
xing0047 / cca-llava
View on GitHub
[NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attention
☆67Aug 30, 2025Updated 10 months ago