seilk/LocalizationHeads

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/seilk/LocalizationHeads)

seilk / LocalizationHeads

[CVPR 2025 Highlight] Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding

☆79

Alternatives and similar repositories for LocalizationHeads

Users that are interested in LocalizationHeads are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

seilk / VisAttnSink
View on GitHub
[ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Models
☆116Feb 16, 2025Updated last year
MICV-yonsei / LocalizationHeads
View on GitHub
[CVPR 2025] Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
☆17Oct 4, 2025Updated 9 months ago
ZhangqiJiang07 / middle_layers_indicating_hallucinations
View on GitHub
[CVPR 2025] Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Att…
☆84Oct 9, 2025Updated 9 months ago
zifuwan / ONLY
View on GitHub
[ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models
☆51Jul 7, 2025Updated last year
NVlabs / FRAG
View on GitHub
☆15Apr 25, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
nickjiang2378 / vlm-hallucinations
View on GitHub
[ICLR '25] Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"
☆106Nov 30, 2025Updated 7 months ago
xmed-lab / TAM
View on GitHub
[ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs
☆189Dec 14, 2025Updated 7 months ago
Lackel / AGLA
View on GitHub
[CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
☆68Jul 16, 2024Updated 2 years ago
lchen1019 / Align-TI
View on GitHub
[ICML 2026] Beyond Next-Token Alignment: Distilling Multimodal Large Language Models via Token Interactions
☆25Feb 11, 2026Updated 5 months ago
wusize / F-LMM
View on GitHub
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
☆115May 29, 2025Updated last year
suikei-wang / RESAnything
View on GitHub
[NeurIPS 2025] RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
☆19May 26, 2026Updated last month
ysj9909 / StAR
View on GitHub
[ECCV 2026] StAR: Segment Anything Reasoner
☆25Apr 2, 2026Updated 3 months ago
Shengcao-Cao / groundLMM
View on GitHub
Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision
☆47Oct 19, 2025Updated 9 months ago
luckybird1994 / IPSeg
View on GitHub
Towards Training-free Open-world Segmentation via Image Prompt Foundation Models,
☆18Nov 22, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
liuting20 / MustDrop
View on GitHub
Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model
☆36Jan 8, 2025Updated last year
hukcc / SHIELD
View on GitHub
[ICLR 2026🔥] SHIELD: Suppressing Hallucinations In LVLM Encoders via Bias and Vulnerability Defense
☆17Mar 24, 2026Updated 3 months ago
bscho333 / ReVisiT
View on GitHub
[ACL 2026 Main] Revisit What You See: Revealing Visual Semantics in Vision Tokens to Guide LVLM Decoding
☆26Nov 21, 2025Updated 8 months ago
MiSsU-HH / VIP
View on GitHub
[ICML 2026] Official implementation of "VIP: Visual-guided Prompt Evolution for Efficient Dense Vision-Language Inference"
☆16May 16, 2026Updated 2 months ago
ChengShiest / Vision-Function-Layer
View on GitHub
[NeurIPS 2025] The official PyTorch implementation of the "Vision Function Layer in MLLM".
☆32Dec 18, 2025Updated 7 months ago
GATECH-EIC / ACT
View on GitHub
[ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…
☆45Jun 30, 2024Updated 2 years ago
om-ai-lab / ZoomEye
View on GitHub
[EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
☆91Nov 20, 2025Updated 8 months ago
slonetime / EBSeg
View on GitHub
[CVPR2024] Open-Vocabulary Semantic Segmentation with Image Embedding Balancing
☆41Jan 12, 2026Updated 6 months ago
2btlFe / GLA-CLIP
View on GitHub
[CVPR2026] This is the official pytorch implementation of "Looking Beyond the Window: Global-Local Aligned CLIP for Training-free Open-Vo…
☆22Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
clemneo / llava-interp
View on GitHub
☆86Nov 5, 2024Updated last year
saccharomycetes / mllms_know
View on GitHub
[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'
☆380Apr 20, 2025Updated last year
rubato-yeong / RRM
View on GitHub
[NeurIPS 2025] Interpreting vision transformers via residual replacement model
☆20Nov 3, 2025Updated 8 months ago
princetonvisualai / icons
View on GitHub
☆22Apr 24, 2025Updated last year
UCSB-AI / DMLR
View on GitHub
[CVPR2026] Official codebase for the paper "Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space"
☆84May 12, 2026Updated 2 months ago
vladan-stojnic / LPOSS
View on GitHub
Code for LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation (CVPR2025)
☆24Nov 8, 2025Updated 8 months ago
spatigen / milr
View on GitHub
Official code of paper: MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning
☆18Feb 12, 2026Updated 5 months ago
Ziwei-Zheng / Nullu
View on GitHub
Code for paper: Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection
☆63Mar 13, 2025Updated last year
songw-zju / PixelThink
View on GitHub
The official implementation of "PixelThink: Towards Efficient Chain-of-Pixel Reasoning" (ICML 2026)
☆43Jul 4, 2026Updated 2 weeks ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
dahyun-kang / lavg
View on GitHub
[ECCV'24] Official PyTorch implementation of In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
☆51Sep 24, 2024Updated last year
NJU-LHRS / ScoreRS
View on GitHub
Code and updates for the ScoreRS project.
☆44Sep 19, 2025Updated 10 months ago
wrudman / NOTICE
View on GitHub
☆14Apr 10, 2025Updated last year
linhuixiao / HiVG
View on GitHub
[ACM MM 2024] Hierarchical Multimodal Fine-grained Modulation for Visual Grounding.
☆65Nov 10, 2025Updated 8 months ago
MICV-yonsei / CASS
View on GitHub
[CVPR 2025] Official Pytorch Code for Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation
☆50Mar 27, 2025Updated last year
AHideoKuzeA / Evol-SAM3
View on GitHub
☆47Jan 1, 2026Updated 6 months ago
MasahiroAraki / SpeechRecognition
View on GitHub
『フリーソフトでつくる音声認識システム（第2版）』（森北出版, 2017）のサポートページ
☆10Aug 16, 2023Updated 2 years ago