rui-qian / READLinks

Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)

☆45

Alternatives and similar repositories for READ

Users that are interested in READ are comparing it to the libraries listed below

Sorting:

wuw2019 / LoTLIP
[NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
☆45Updated 9 months ago
sterzhang / PVIT
Official Repository of Personalized Visual Instruct Tuning
☆32Updated 7 months ago
wusize / F-LMM
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
☆104Updated 5 months ago
leaves162 / CLIPtrase
cliptrase
☆46Updated last year
Paranioar / UniPT
[CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"
☆67Updated last year
Lackel / AGLA
[CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
☆49Updated last year
lizhou-cs / mglmm
☆32Updated last year
kodenii / Ref-Diff
Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models
☆19Updated 5 months ago
wjpoom / SPEC
[CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"
☆49Updated 4 months ago
Shengcao-Cao / groundLMM
Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision
☆40Updated last week
callsys / ControlCap
[ECCV 2024] ControlCap: Controllable Region-level Captioning
☆79Updated last year
ant-research / DreamLIP
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆136Updated 5 months ago
see-say-segment / sesame
🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"
☆44Updated last year
YBZh / DMN
CVPR2024: Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models
☆85Updated last year
tripletclip / TripletCLIP
[NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"
☆44Updated 10 months ago
adobe-research / llava-score
☆11Updated last year
geekyutao / TaskRes
Task Residual for Tuning Vision-Language Models (CVPR 2023)
☆73Updated 2 years ago
mc-lan / ClearCLIP
[ECCV2024] ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
☆94Updated 7 months ago
THU-MIG / VTC-CLS
official repo for paper "[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs"
☆23Updated 6 months ago
SivanDoveh / IPLoc
Repository for the paper: Teaching VLMs to Localize Specific Objects from In-context Examples
☆35Updated 11 months ago
ChocoWu / SeTok
Codes for ICLR 2025 Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM
☆75Updated 6 months ago
tiiuae / FineLIP
code for FineLIP
☆33Updated last month
chenshuang-zhang / imagenet_d
[CVPR 2024 Highlight] ImageNet-D
☆44Updated last year
meetdavidwan / crg
PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"
☆37Updated last year
zifuwan / ONLY
[ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models
☆40Updated 3 months ago
aimagelab / freeda
FreeDA: Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation (CVPR 2024)
☆46Updated last year
yu-rp / apiprompting
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
☆104Updated last year
mbzuai-oryx / VideoGLaMM
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
☆87Updated 6 months ago
heliossun / SQ-LLaVA
Visual self-questioning for large vision-language assistant.
☆45Updated 3 months ago
Qinying-Liu / TagAlign
Official implementation of TagAlign
☆35Updated 10 months ago