lnairGT / CLIP-DistillationLinks

Knowledge Distillation using Contrastive Language-Image Pretraining (CLIP) without a teacher model.

☆18

Alternatives and similar repositories for CLIP-Distillation

Users that are interested in CLIP-Distillation are comparing it to the libraries listed below

Sorting:

m1k2zoo / negbench
Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"
☆37Updated 6 months ago
mlvlab / RALF
Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".
☆44Updated last year
StanfordMIMI / villa
[ICCV 2023] ViLLA: Fine-grained vision-language representation learning from real-world data
☆46Updated 2 years ago
omipan / svl_adapter
SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models
☆20Updated last year
yuecao0119 / MMFuser
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …
☆59Updated last year
mlvlab / DAPT
Distribution-Aware Prompt Tuning for Vision-Language Models (ICCV 2023)
☆43Updated last year
sIncerass / MVLPT
code for "Multitask Vision-Language Prompt Tuning" https://arxiv.org/abs/2211.11720
☆57Updated last year
layer6ai-labs / fusemix
Data-Efficient Multimodal Fusion on a Single GPU
☆67Updated last year
lezhang7 / Enhance-FineGrained
[CVPR 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding
☆52Updated 7 months ago
ZhengYu518 / VL-Mamba
Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"
☆84Updated last year
jusiro / CLAP
[CVPR'24] Validation-free few-shot adaptation of CLIP, using a well-initialized Linear Probe (ZSLP) and class-adaptive constraints (CLAP)…
☆77Updated 5 months ago
akhtarvision / cal-detr
☆42Updated 2 years ago
microsoft / A-CLIP
Official Implementation of Attentive Mask CLIP (ICCV2023, https://arxiv.org/abs/2212.08653)
☆32Updated last year
fistyee / MixPro
🔥MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer [Official, ICLR 2023]
☆21Updated 2 years ago
iancovert / locality-alignment
☆53Updated 9 months ago
dhg-wei / MCL
(ICML 2024) Improve Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning
☆27Updated last year
heliossun / SQ-LLaVA
Visual self-questioning for large vision-language assistant.
☆45Updated 3 months ago
jameelhassan / PromptAlign
[NeurIPS 2023] Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
☆108Updated last year
sMamooler / CLIP_Explainability
code for studying OpenAI's CLIP explainability
☆35Updated 3 years ago
changdaeoh / multimodal-mixup
Official implementation for NeurIPS'23 paper "Geodesic Multi-Modal Mixup for Robust Fine-Tuning"
☆35Updated last year
FeiElysia / awesome-zero-shot-captioning
A curated list of zero-shot captioning papers
☆24Updated 2 years ago
mzhaoshuai / RLCF
[ICLR 2024] Test-Time RL with CLIP Feedback for Vision-Language Models.
☆95Updated 3 weeks ago
alinlab / s-clip
S-CLIP: Semi-supervised Vision-Language Pre-training using Few Specialist Captions
☆49Updated 2 years ago
jeykigung / HiCLIP
☆30Updated 2 years ago
lixinustc / GraphAdapter
The efficient tuning method for VLMs
☆80Updated last year
cvl-umass / AdaptCLIPZS
☆43Updated last month
ivonajdenkoska / multimodal-meta-learn
[ICLR 2023] Official code repository for "Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning"
☆60Updated 2 years ago
edgeai1 / LF-ViT
This is Pytorch implementation of our paper "LF-ViT: Reducing Spatial Redundancy in Vision Transformer for Efficient Image Recognition".
☆11Updated last year
Shengcao-Cao / groundLMM
Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision
☆40Updated 3 weeks ago
Zi-hao-Wei / Efficient-Vision-Language-Pre-training-by-Cluster-Masking
[CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.
☆29Updated last year