rom1504 / laion-preproView external linksLinks
Get hundred of million of image+url from the crawling at home dataset and preprocess them
☆223May 26, 2024Updated last year
Alternatives and similar repositories for laion-prepro
Users that are interested in laion-prepro are comparing it to the libraries listed below
Sorting:
- Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.☆4,358Oct 19, 2025Updated 3 months ago
- Easily compute clip embeddings and build a clip retrieval system with them☆2,726Aug 15, 2025Updated 5 months ago
- Description and pointers of laion datasets☆250Nov 5, 2022Updated 3 years ago
- Aim for the moon. If you miss, you may hit a star.☆164Feb 14, 2023Updated 2 years ago
- Efficiently read embedding in streaming from any filesystem☆104Aug 9, 2025Updated 6 months ago
- DataComp: In search of the next generation of multimodal datasets☆768Apr 28, 2025Updated 9 months ago
- A CLIP conditioned Decision Transformer.☆22Jul 14, 2021Updated 4 years ago
- Un-*** 50 billions multimodality dataset☆23Sep 14, 2022Updated 3 years ago
- Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text, ...☆322Dec 9, 2023Updated 2 years ago
- ☆112Aug 5, 2021Updated 4 years ago
- CLIP-like model evaluation☆800Jan 15, 2026Updated 3 weeks ago
- ☆64Nov 4, 2021Updated 4 years ago
- Using pretrained encoder and language models to generate captions from multimedia inputs.☆100Mar 11, 2023Updated 2 years ago
- Script and models for clustering LAION-400m CLIP embeddings.☆26Jan 10, 2022Updated 4 years ago
- ☆18Nov 7, 2022Updated 3 years ago
- [CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jon…☆68Dec 17, 2022Updated 3 years ago
- checkpoints for glide finetuned on laion and other datasets. wip.☆50Aug 17, 2022Updated 3 years ago
- [NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"☆320Jun 3, 2024Updated last year
- I have created a dataset of Image-Text-Pairs by using the cosine similarity of the CLIP embeddings of the image & it's caption derrived f…☆15Apr 22, 2021Updated 4 years ago
- Official implementation of VQ-Diffusion☆975Apr 17, 2024Updated last year
- Pretrained Dalle2 from laion☆504Apr 15, 2023Updated 2 years ago
- OpenAI CLIP based image generator with complex config file controlled transformation and training pipelines☆19Jan 4, 2022Updated 4 years ago
- OpenAI CLIP text encoders for multiple languages!☆825May 15, 2023Updated 2 years ago
- An open source implementation of CLIP.☆13,353Nov 4, 2025Updated 3 months ago
- Colab notebook to finetune GLIDE.☆12Mar 22, 2022Updated 3 years ago
- Refactoring dalle-pytorch and taming-transformers for TPU VM☆60Aug 30, 2021Updated 4 years ago
- Code for reproducing the experiments on large-scale pre-training and transfer learning for the paper "Effect of large-scale pre-training …☆19May 29, 2022Updated 3 years ago
- This is a summary of easily available datasets for generalized DALLE-pytorch training.☆130Apr 19, 2022Updated 3 years ago
- ☆48Aug 2, 2021Updated 4 years ago
- A concise but complete implementation of CLIP with various experimental improvements from recent papers☆722Oct 16, 2023Updated 2 years ago
- COYO-700M: Large-scale Image-Text Pair Dataset☆1,251Nov 30, 2022Updated 3 years ago
- Simple script to re-rank images using OpenAI's CLIP https://github.com/openai/CLIP.☆15May 3, 2021Updated 4 years ago
- Repository for the paper "Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images"☆451Apr 28, 2023Updated 2 years ago
- RUDOLPH: One Hyper-Tasking Transformer can be creative as DALL-E and GPT-3 and smart as CLIP☆254Feb 6, 2023Updated 3 years ago
- Automatically create Faiss knn indices with the most optimal similarity search parameters.☆894Nov 4, 2025Updated 3 months ago
- Regularizing Generative Adversarial Networks under Limited Data (CVPR 2021)☆166May 20, 2024Updated last year
- Finetune glide-text2im from openai on your own data.☆88Updated this week
- When Dall E was a baby trained on a bit of data☆27Feb 26, 2021Updated 4 years ago
- Big-Interleaved-Dataset☆58Jan 21, 2023Updated 3 years ago