ivonajdenkoska / tulipLinks

[ICLR 2025] Official code repository for "TULIP: Token-length Upgraded CLIP"

☆33

Alternatives and similar repositories for tulip

Users that are interested in tulip are comparing it to the libraries listed below

Sorting:

QUVA-Lab / PIN
Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
☆26Updated 11 months ago
wjpoom / SPEC
[CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"
☆50Updated 6 months ago
naver-ai / prolip
☆55Updated 4 months ago
StanfordMIMI / villa
[ICCV 2023] ViLLA: Fine-grained vision-language representation learning from real-world data
☆46Updated 2 years ago
iancovert / locality-alignment
☆53Updated 11 months ago
tripletclip / TripletCLIP
[NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"
☆46Updated last year
QUVA-Lab / SIGMA
☆20Updated 5 months ago
UCSC-VLAA / CLIPS
An Enhanced CLIP Framework for Learning with Synthetic Captions
☆38Updated 8 months ago
ytaek-oh / vl_compo
☆10Updated last year
Zi-hao-Wei / Efficient-Vision-Language-Pre-training-by-Cluster-Masking
[CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.
☆29Updated last year
jiaangli / VLCA
Do Vision and Language Models Share Concepts? A Vector Space Alignment Study
☆16Updated last year
eric-ai-lab / ComCLIP
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
☆37Updated last year
james-oldfield / muMoE
[NeurIPS'24] Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization
☆38Updated last year
hammoudhasan / DiffCLIP
Official Implementation of DiffCLIP: Differential Attention Meets CLIP
☆48Updated 9 months ago
arijitray1993 / COLA
COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!
☆25Updated last year
McGill-NLP / diffusion-itm
Code and data setup for the paper "Are Diffusion Models Vision-and-language Reasoners?"
☆33Updated last year
peterant330 / KUEA
[ICML'25] Kernel-based Unsupervised Embedding Alignment for Enhanced Visual Representation in Vision-language Models
☆19Updated 3 months ago
sarahESL / AlignCLIP
AlignCLIP: Improving Cross-Modal Alignment in CLIP (ICLR 2025)
☆52Updated 9 months ago
m1k2zoo / negbench
Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"
☆42Updated 8 months ago
NMS05 / Patch-Aligned-Contrastive-Learning
☆23Updated 2 years ago
Razaimam45 / TTL-Test-Time-Low-Rank-Adaptation
Official code repository of paper titled "Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Visio…
☆31Updated 7 months ago
k1rezaei / Text-to-concept
☆35Updated last year
alhojel / visual_task_vectors
☆40Updated last year
amitakamath / vl_text_encoders_are_bottlenecks
Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!
☆11Updated 2 years ago
kaist-ami / BEAF
[ECCV’24] Official repository for "BEAF: Observing Before-AFter Changes to Evaluate Hallucination in Vision-language Models"
☆21Updated 9 months ago
layer6ai-labs / fusemix
Data-Efficient Multimodal Fusion on a Single GPU
☆68Updated last year
hammoudhasan / SynthCLIP
Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.
☆101Updated 9 months ago
Understanding-Visual-Datasets / VisDiff
Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)
☆129Updated last month
showlab / datacentric.vlp
Compress conventional Vision-Language Pre-training data
☆52Updated 2 years ago
WalBouss / GEM
[CVPR24] Official Implementation of GEM (Grounding Everything Module)
☆134Updated 8 months ago