arampacha / CLIP-rsicd
☆200Updated 2 years ago
Related projects: ⓘ
- [CVPR 2022] Official code for "Unified Contrastive Learning in Image-Text-Label Space"☆382Updated 10 months ago
- CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet☆203Updated last year
- CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)☆181Updated 7 months ago
- CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks☆346Updated last year
- Robust fine-tuning of zero-shot models☆629Updated 2 years ago
- Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm☆628Updated 2 years ago
- [NeurIPS 2022] Official repository of paper titled "Bridging the Gap between Object and Image-level Representations for Open-Vocabulary …☆284Updated last year
- Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)☆149Updated 9 months ago
- Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time☆409Updated 2 months ago
- [NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"☆292Updated 3 months ago
- ☆530Updated 9 months ago
- A PyTorch Lightning solution to training OpenAI's CLIP from scratch.☆654Updated 2 years ago
- GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)☆177Updated last year
- PyTorch code for hierarchical k-means -- a data curation method for self-supervised learning☆119Updated 3 months ago
- Masked Siamese Networks for Label-Efficient Learning (https://arxiv.org/abs/2204.07141)☆447Updated 2 years ago
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆156Updated last year
- Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR …☆234Updated last year
- Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic☆261Updated 2 years ago
- [CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"☆696Updated 6 months ago
- CLIP (Contrastive Language–Image Pre-training) for Italian☆179Updated last year
- [NeurIPS 2022] Official PyTorch implementation of Optimizing Relevance Maps of Vision Transformers Improves Robustness. This code allows …☆124Updated last year
- Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"☆251Updated 4 months ago
- An ever-growing playground of notebooks showcasing CLIP's impressive zero-shot capabilities☆149Updated 2 years ago
- [NeurIPS 2023] Text data, code and pre-trained models for paper "Improving CLIP Training with Language Rewrites"☆251Updated 8 months ago
- ☆215Updated 9 months ago
- [CVPR 2022] DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting☆507Updated last year
- Probing the representations of Vision Transformers.☆311Updated last year
- Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text Supervision, CVPR 2022.☆721Updated 2 years ago
- Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.☆357Updated last year
- An official PyTorch implementation of the CRIS paper☆244Updated 3 months ago