google-research-datasets / wit
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.
ā993Updated 2 months ago
Related projects: ā
- OpenAI CLIP text encoders for multiple languages!ā746Updated last year
- Implementation of š¦© Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorchā1,193Updated last year
- Code release for SLIP Self-supervision meets Language-Image Pre-trainingā743Updated last year
- CLIP-like model evaluationā584Updated last month
- Oscar and VinVLā1,037Updated last year
- Automatically create Faiss knn indices with the most optimal similarity search parameters.ā802Updated 4 months ago
- Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorchā1,037Updated 9 months ago
- Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.ā357Updated last year
- Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorchā850Updated 10 months ago
- A concise but complete implementation of CLIP with various experimental improvements from recent papersā678Updated 11 months ago
- A PyTorch Lightning solution to training OpenAI's CLIP from scratch.ā654Updated 2 years ago
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.ā894Updated 3 months ago
- DataComp: In search of the next generation of multimodal datasetsā637Updated 8 months ago
- š§ Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".ā473Updated 10 months ago
- Flexible components pairing š¤ Transformers with Pytorch Lightningā610Updated last year
- Research code for pixel-based encoders of language (PIXEL)ā329Updated 6 months ago
- Robust fine-tuning of zero-shot modelsā629Updated 2 years ago
- Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Lā¦ā2,394Updated 4 months ago
- Multi Task Vision and Languageā799Updated 2 years ago
- The implementation of DeBERTaā1,966Updated 11 months ago
- ā480Updated 7 months ago
- Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)ā858Updated 10 months ago
- TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.ā1,423Updated this week
- Conceptual Captions is a dataset containing (image-URL, caption) pairs designed for the training and evaluation of machine learned image ā¦ā513Updated 3 years ago
- Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.ā3,602Updated last month
- COYO-700M: Large-scale Image-Text Pair Datasetā1,143Updated last year
- Language Models Can See: Plugging Visual Controls in Text Generationā251Updated 2 years ago
- Easily compute clip embeddings and build a clip retrieval system with themā2,355Updated 5 months ago
- ā963Updated last year
- Task-based datasets, preprocessing, and evaluation for sequence models.ā552Updated this week