KishoreP1 / DetailCLIPLinks

Detail-Oriented CLIP for Fine-Grained Tasks (ICLR SSI-FM 2025)

☆55

Alternatives and similar repositories for DetailCLIP

Users that are interested in DetailCLIP are comparing it to the libraries listed below

Sorting:

wusize / F-LMM
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
☆105Updated 5 months ago
ant-research / DreamLIP
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆136Updated 6 months ago
ExplainableML / flair
[CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representations
☆115Updated 2 months ago
wuw2019 / LoTLIP
[NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
☆46Updated 10 months ago
wangf3014 / SCLIP
Official implementation of SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
☆174Updated last year
dogehhh / ReCLIP
Pytorch Implementation for CVPR 2024 paper: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation
☆56Updated 2 months ago
mbzuai-oryx / VideoGLaMM
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
☆90Updated 7 months ago
lezhang7 / SAIL
[CVPR 2025 Highlight] Official Pytorch codebase for paper: "Assessing and Learning Alignment of Unimodal Vision and Language Models"
☆51Updated 3 months ago
jiaosiyu1999 / MAFT
☆59Updated last year
mc-lan / ProxyCLIP
[ECCV2024] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
☆111Updated 7 months ago
meetdavidwan / crg
PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"
☆37Updated last year
ExplainableML / cosmos
[CVPR 2025] COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
☆35Updated 7 months ago
kkakkkka / ETRIS
[ICCV-2023] The official code of Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation
☆138Updated 4 months ago
TempleX98 / MoVA
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
☆168Updated last year
mrwu-mac / ControlMLLM
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
☆197Updated 4 months ago
mzhaoshuai / RLCF
[ICLR 2024] Test-Time RL with CLIP Feedback for Vision-Language Models.
☆95Updated last month
megvii-research / CasPL
☆49Updated 9 months ago
Hoar012 / RAP-MLLM
[CVPR 2025] RAP: Retrieval-Augmented Personalization
☆74Updated 3 months ago
Zi-hao-Wei / Efficient-Vision-Language-Pre-training-by-Cluster-Masking
[CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.
☆29Updated last year
lloongx / DIKI
[ECCV 2024] Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models
☆54Updated last year
mlvlab / RPO
Official Implementation of "Read-only Prompt Optimization for Vision-Language Few-shot Learning", ICCV 2023
☆54Updated 2 years ago
Rubics-Xuan / MRES
This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentati…
☆72Updated last year
leaves162 / CLIPtrase
cliptrase
☆47Updated last year
aimagelab / freeda
FreeDA: Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation (CVPR 2024)
☆47Updated last year
Hanzy1996 / OpenSeg-R
OpenSeg-R: Improving Open-Vocabulary Segmentation via Step-by-Step Visual Reasoning
☆26Updated 5 months ago
LeapLabTHU / GSVA
[CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Models
☆152Updated last year
iancovert / locality-alignment
☆53Updated 10 months ago
Jiaxuan-Li / EVCap
[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
☆59Updated last year
linyq2117 / TagCLIP
[AAAI 2024] TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training
☆104Updated last year
eric-ai-lab / GRIT
Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"
☆162Updated last month