google-research-datasets / conceptual-12m
Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.
☆377Updated last year
Alternatives and similar repositories for conceptual-12m:
Users that are interested in conceptual-12m are comparing it to the libraries listed below
- Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic☆272Updated 2 years ago
- Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm☆643Updated 2 years ago
- A concise but complete implementation of CLIP with various experimental improvements from recent papers☆705Updated last year
- [NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"☆309Updated 7 months ago
- Code release for SLIP Self-supervision meets Language-Image Pre-training☆754Updated last year
- Get hundred of million of image+url from the crawling at home dataset and preprocess them☆215Updated 8 months ago
- Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]☆355Updated 2 years ago
- Multi-modality pre-training☆479Updated 8 months ago
- Language Models Can See: Plugging Visual Controls in Text Generation☆257Updated 2 years ago
- CLIPScore EMNLP code☆210Updated 2 years ago
- A PyTorch Lightning solution to training OpenAI's CLIP from scratch.☆677Updated 2 years ago
- Code for paper LAFITE: Towards Language-Free Training for Text-to-Image Generation (CVPR 2022)☆181Updated last year
- CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)☆189Updated last year
- CLIP-like model evaluation☆656Updated 5 months ago
- [CVPR 2022] Official code for "Unified Contrastive Learning in Image-Text-Label Space"☆390Updated last year
- [ICLR 2022] code for "How Much Can CLIP Benefit Vision-and-Language Tasks?" https://arxiv.org/abs/2107.06383☆409Updated 2 years ago
- Large-scale text-video dataset. 10 million captioned short videos.☆619Updated 5 months ago
- project page for VinVL☆351Updated last year
- An ever-growing playground of notebooks showcasing CLIP's impressive zero-shot capabilities☆158Updated 2 years ago
- PyTorch code for MUST☆106Updated last year
- Densely Captioned Images (DCI) dataset repository.☆168Updated 6 months ago
- Generate text captions for images from their embeddings.☆102Updated last year
- Reliably download millions of images efficiently☆114Updated 3 years ago
- CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet☆212Updated 2 years ago
- Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis☆312Updated last year
- Recent Advances in Vision and Language Pre-training (VLP)☆290Updated last year
- ☆221Updated last year
- ☆98Updated 3 months ago
- Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)☆158Updated last year
- [CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning☆208Updated 2 years ago