cardinalblue / clip-models-for-distillation
☆19Updated last year
Related projects: ⓘ
- [FGVC9-CVPR 2022] The second place solution for 2nd eBay eProduct Visual Search Challenge.☆26Updated 2 years ago
- ☆25Updated 3 years ago
- Research code for "Training Vision-Language Transformers from Captions Alone"☆34Updated 2 years ago
- A non-JIT version implementation / replication of CLIP of OpenAI in pytorch☆34Updated 3 years ago
- CLIP-Art: Contrastive Pre-training for Fine-Grained Art Classification - 4th Workshop on Computer Vision for Fashion, Art, and Design☆27Updated 2 years ago
- ☆25Updated 3 years ago
- Repository for the paper "Data Efficient Masked Language Modeling for Vision and Language".☆17Updated 3 years ago
- This is the official repository for CookGAN: Meal Image Synthesis from Ingredients☆23Updated last year
- Use CLIP to represent video for Retrieval Task☆67Updated 3 years ago
- CLIP4IDC: CLIP for Image Difference Captioning (AACL 2022)☆27Updated last year
- Implementation of our PR 2020 paper:Unsupervised Text-to-Image Synthesis☆13Updated 4 years ago
- Rethinking Nearest Neighbors for Visual Classification☆31Updated 2 years ago
- ☆43Updated 3 years ago
- [ECCV2022] Contrastive Vision-Language Pre-training with Limited Resources☆42Updated last year
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆72Updated last year
- [NeurIPS 2021] ORL: Unsupervised Object-Level Representation Learning from Scene Images☆58Updated 2 years ago
- ☆32Updated 2 years ago
- ☆29Updated 2 years ago
- This project provides a data set with bounding boxes, body poses, 3D face meshes & captions of people from our LAION-2.2B. Additionally i…☆13Updated 2 years ago
- Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model☆38Updated last year
- ☆11Updated 4 years ago
- [BMVC22] Official Implementation of ViCHA: "Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment"☆52Updated last year
- ☆24Updated 3 years ago
- [Arxiv2022] Revitalize Region Feature for Democratizing Video-Language Pre-training☆21Updated 2 years ago
- MDMMT: Multidomain Multimodal Transformer for Video Retrieval☆26Updated 3 years ago
- Large-Scale Bidirectional Training for Zero-Shot Image Captioning☆21Updated last year
- ☆19Updated this week
- Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(ICCV, 2021) paper☆30Updated 2 years ago
- Official repository for the General Robust Image Task (GRIT) Benchmark☆48Updated last year
- ☆47Updated 3 years ago