kakaobrain / coyo-datasetLinks
COYO-700M: Large-scale Image-Text Pair Dataset
β1,217Updated 2 years ago
Alternatives and similar repositories for coyo-dataset
Users that are interested in coyo-dataset are comparing it to the libraries listed below
Sorting:
- A concise but complete implementation of CLIP with various experimental improvements from recent papersβ711Updated last year
- Implementation of 𦩠Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorchβ1,241Updated 2 years ago
- β695Updated 2 years ago
- Implementation of Muse: Text-to-Image Generation via Masked Generative Transformers, in Pytorchβ895Updated last year
- Versatile Diffusion: Text, Images and Variations All in One Diffusion Model, arXiv 2022 / ICCV 2023β1,328Updated last year
- GIT: A Generative Image-to-text Transformer for Vision and Languageβ566Updated last year
- Official implementation of VQ-Diffusionβ939Updated last year
- Official Implementation of Paella https://arxiv.org/abs/2211.07292v2β744Updated last year
- Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)β1,939Updated last year
- official code repo for paper "CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers"β951Updated 2 years ago
- ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Expertβ¦β1,447Updated 2 months ago
- CLIP-like model evaluationβ717Updated last week
- Implementation of Parti, Google's pure attention-based text-to-image neural network, in Pytorchβ532Updated last year
- Zero-shot Image-to-Image Translation [SIGGRAPH 2023]β1,120Updated 7 months ago
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.β930Updated 2 months ago
- [CVPR 2022] Official PyTorch Implementation for DiffusionCLIP: Text-guided Image Manipulation Using Diffusion Modelsβ840Updated 2 years ago
- Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"β1,420Updated 2 years ago
- DataComp: In search of the next generation of multimodal datasetsβ710Updated last month
- π Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".β455Updated last year
- Diffusion attentive attribution maps for interpreting Stable Diffusion.β759Updated last year
- Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorchβ1,141Updated last year
- Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priorsβ335Updated 2 years ago
- Large-scale text-video dataset. 10 million captioned short videos.β636Updated 9 months ago
- Open reproduction of MUSE for fast text2image generation.β350Updated last year
- Official Implementation for "Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models" (SIGGRAPH 2023)β736Updated last year
- Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.β4,043Updated 9 months ago
- π§ Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".β482Updated last year
- Yet another PyTorch implementation of Stable Diffusion (probably easy to read)