[CVPR 2025 Highlight] Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
☆61Aug 31, 2025Updated 6 months ago
Alternatives and similar repositories for LocalizationHeads
Users that are interested in LocalizationHeads are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Models☆92Feb 16, 2025Updated last year
- RESAnything: Attribute Prompting for Arbitrary Referring Segmentation☆17Nov 28, 2025Updated 3 months ago
- Towards Training-free Open-world Segmentation via Image Prompt Foundation Models,☆18Nov 22, 2024Updated last year
- [CVPR 2025] Official Pytorch Code for Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation☆46Mar 27, 2025Updated 11 months ago
- Code for paper: Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection☆51Mar 13, 2025Updated 11 months ago
- code for the paper "CoReS: Orchestrating the Dance of Reasoning and Segmentation"☆21Nov 24, 2025Updated 3 months ago
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆46Jun 30, 2024Updated last year
- [CVPR25] CoLLM: A Large Language Model for Composed Image Retrieval☆28Mar 26, 2025Updated 11 months ago
- Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning☆41Aug 4, 2025Updated 6 months ago
- This repo is the official pytorch implementation of the paper: CLIPer: Hierarchically Improving Spatial Representation of CLIP for Open-V…☆40Sep 10, 2025Updated 5 months ago
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model☆37Jan 8, 2025Updated last year
- [CVPR2024] Open-Vocabulary Semantic Segmentation with Image Embedding Balancing☆40Jan 12, 2026Updated last month
- A collection of awesome think with videos papers.☆89Dec 1, 2025Updated 2 months ago
- SurgLaVi: Large-Scale Hierarchical Datasets for Surgical Vision–Language Representation Learning☆23Feb 2, 2026Updated 3 weeks ago
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆44Apr 18, 2025Updated 10 months ago
- [CVPR 2024] LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation☆13Jun 17, 2024Updated last year
- ☆13Aug 28, 2024Updated last year
- Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation☆12Dec 5, 2025Updated 2 months ago
- Official repository of the paper "High-Quality Mask Tuning Matters for Open-Vocabulary Segmentation"☆45Mar 25, 2025Updated 11 months ago
- [CVPR 2025 HIghlight] XLRS-Bench: ould Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?☆54Oct 31, 2025Updated 4 months ago
- Rare-to-Frequent (R2F), ICLR'25, Spotlight☆53Apr 23, 2025Updated 10 months ago
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.☆106Jun 29, 2025Updated 8 months ago
- Official code for LaRE2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection. (CVPR 2024)☆51Dec 3, 2024Updated last year
- This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".☆84Jul 10, 2025Updated 7 months ago
- [ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs☆176Dec 14, 2025Updated 2 months ago
- This project is a demonstration of a content-based recommendation system for Spotify that leverages user's preferences and audio features …☆17Apr 4, 2023Updated 2 years ago
- ☆12Apr 26, 2022Updated 3 years ago
- 用于自动预约民政局婚姻登记处的号,限广东省民政局☆10Jun 25, 2023Updated 2 years ago
- ☆20Nov 21, 2025Updated 3 months ago
- EgoToM is an egocentric theory-of-mind benchmark built on Ego4D videos, containing multi-choice questions that evaluate multimodal large …☆13Apr 1, 2025Updated 10 months ago
- (TGRS2023) PyTorch implementation of "EARL: An Elliptical Distribution aided Adaptive Label Assignment for Oriented Object Detection in R…☆14Oct 11, 2023Updated 2 years ago
- ☆11Jul 2, 2022Updated 3 years ago
- [CVPR 2023] "TrojViT: Trojan Insertion in Vision Transformers" by Mengxin Zheng, Qian Lou, Lei Jiang☆14Jan 5, 2024Updated 2 years ago
- [ECCV 2024] The first zero-shot setting for spatio-temporal video grounding.☆11Jul 16, 2024Updated last year
- [CVPR'25] Official code of paper "Mimic In-Context Learning for Multimodal Tasks"☆24Jun 8, 2025Updated 8 months ago
- (2024) The Official Repository of Paper "SISP: A Benchmark Dataset for Fine-grained Ship Instance Segmentation in Panchromatic Satellite …☆14Feb 7, 2024Updated 2 years ago
- [CVPR 2025] Code for "Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering".☆20Jun 16, 2025Updated 8 months ago
- Source codes for the paper "Personalized Dynamic Music Emotion Recognition with Dual-Scale Attention-Based Meta-Learning" (PDMER) which p…☆14Mar 24, 2025Updated 11 months ago
- ☆10Mar 31, 2025Updated 11 months ago