tripletclip / TripletCLIPLinks

[NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"

☆46

Alternatives and similar repositories for TripletCLIP

Users that are interested in TripletCLIP are comparing it to the libraries listed below

Sorting:

Zi-hao-Wei / Efficient-Vision-Language-Pre-training-by-Cluster-Masking
[CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.
☆29Updated last year
NMS05 / Patch-Aligned-Contrastive-Learning
☆23Updated 2 years ago
Shengcao-Cao / groundLMM
Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision
☆42Updated 2 months ago
UCSC-VLAA / CLIPS
An Enhanced CLIP Framework for Learning with Synthetic Captions
☆38Updated 8 months ago
wuw2019 / LoTLIP
[NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
☆47Updated 11 months ago
yuecao0119 / MMFuser
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …
☆61Updated last year
wjpoom / SPEC
[CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"
☆50Updated 6 months ago
rui-qian / READ
Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)
☆51Updated 2 months ago
ytaek-oh / vl_compo
☆10Updated last year
wusize / F-LMM
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
☆109Updated 7 months ago
iancovert / locality-alignment
☆53Updated 11 months ago
hammoudhasan / DiffCLIP
Official Implementation of DiffCLIP: Differential Attention Meets CLIP
☆50Updated 9 months ago
Paranioar / UniPT
[CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"
☆68Updated last year
lezhang7 / Enhance-FineGrained
[CVPR 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding
☆53Updated 8 months ago
heliossun / SQ-LLaVA
Visual self-questioning for large vision-language assistant.
☆45Updated 5 months ago
zycheiheihei / Transferable-Visual-Prompting
[CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompt…
☆46Updated last year
SivanDoveh / IPLoc
Repository for the paper: Teaching VLMs to Localize Specific Objects from In-context Examples
☆39Updated last year
mlvlab / DAPT
Distribution-Aware Prompt Tuning for Vision-Language Models (ICCV 2023)
☆44Updated 2 years ago
Han-Zongbo / Skip-n
This repository contains the code of our paper 'Skip \n: A simple method to reduce hallucination in Large Vision-Language Models'.
☆15Updated last year
CVMI-Lab / clip-beyond-tail
(NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights
☆28Updated last year
mlvlab / RALF
Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".
☆44Updated last year
chenshuang-zhang / imagenet_d
[CVPR 2024 Highlight] ImageNet-D
☆46Updated last year
Qinying-Liu / TagAlign
Official implementation of TagAlign
☆35Updated last year
see-say-segment / sesame
🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"
☆46Updated last year
callsys / ControlCap
[ECCV 2024] ControlCap: Controllable Region-level Captioning
☆80Updated last year
naver-ai / prolip
☆55Updated 4 months ago
GasolSun36 / MVP
Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning
☆24Updated last year
thunlp / DeepPerception
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
☆65Updated 6 months ago
sterzhang / PVIT
Official Repository of Personalized Visual Instruct Tuning
☆33Updated 9 months ago
geekyutao / TaskRes
Task Residual for Tuning Vision-Language Models (CVPR 2023)
☆74Updated 2 years ago