rom1504/laion-prepro

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/rom1504/laion-prepro)

rom1504 / laion-prepro

Get hundred of million of image+url from the crawling at home dataset and preprocess them

☆222

Alternatives and similar repositories for laion-prepro

Users that are interested in laion-prepro are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

rom1504 / img2dataset
View on GitHub
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
☆4,436Oct 19, 2025Updated 9 months ago
rom1504 / clip-retrieval
View on GitHub
Easily compute clip embeddings and build a clip retrieval system with them
☆2,788Mar 28, 2026Updated 3 months ago
LAION-AI / laion-datasets
View on GitHub
Description and pointers of laion datasets
☆255Nov 5, 2022Updated 3 years ago
LAION-AI / laion-dreams
View on GitHub
Aim for the moon. If you miss, you may hit a star.
☆168Feb 14, 2023Updated 3 years ago
rom1504 / embedding-reader
View on GitHub
Efficiently read embedding in streaming from any filesystem
☆106Aug 9, 2025Updated 11 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
mlfoundations / datacomp
View on GitHub
DataComp: In search of the next generation of multimodal datasets
☆787Apr 28, 2025Updated last year
dzryk / antarctic-captions
View on GitHub
☆110Aug 5, 2021Updated 4 years ago
LAION-AI / laion-dedup
View on GitHub
☆18Nov 7, 2022Updated 3 years ago
AranKomat / Diff-DALLE
View on GitHub
☆65Nov 4, 2021Updated 4 years ago
crowsonkb / cond_transformer_2
View on GitHub
A CLIP conditioned Decision Transformer.
☆22Jul 14, 2021Updated 5 years ago
rom1504 / cc2dataset
View on GitHub
Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text, ...
☆321Dec 9, 2023Updated 2 years ago
afiaka87 / laionide
View on GitHub
checkpoints for glide finetuned on laion and other datasets. wip.
☆50Aug 17, 2022Updated 3 years ago
pbaylies / clustering-laion400m
View on GitHub
Script and models for clustering LAION-400m CLIP embeddings.
☆26Jan 10, 2022Updated 4 years ago
LAION-AI / CLIP_benchmark
View on GitHub
CLIP-like model evaluation
☆815Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
LAION-AI / laion50BU
View on GitHub
Un-*** 50 billions multimodality dataset
☆24Sep 14, 2022Updated 3 years ago
FreddeFrallan / Multilingual-CLIP
View on GitHub
OpenAI CLIP text encoders for multiple languages!
☆833May 15, 2023Updated 3 years ago
mlfoundations / open_clip
View on GitHub
An open source implementation of CLIP.
☆14,020Jul 17, 2026Updated last week
dzryk / cliptalk
View on GitHub
☆19Aug 19, 2021Updated 4 years ago
UCSC-VLAA / CLIPA
View on GitHub
[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"
☆322Jun 3, 2024Updated 2 years ago
microsoft / VQ-Diffusion
View on GitHub
Official implementation of VQ-Diffusion
☆981Apr 17, 2024Updated 2 years ago
TheoCoombes / ClipCap
View on GitHub
Using pretrained encoder and language models to generate captions from multimedia inputs.
☆101Mar 11, 2023Updated 3 years ago
criteo / autofaiss
View on GitHub
Automatically create Faiss knn indices with the most optimal similarity search parameters.
☆907Nov 4, 2025Updated 8 months ago
LAION-AI / dalle2-laion
View on GitHub
Pretrained Dalle2 from laion
☆505Apr 15, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
defgsus / clipig
View on GitHub
OpenAI CLIP based image generator with complex config file controlled transformation and training pipelines
☆19Jan 4, 2022Updated 4 years ago
ai-forever / ru-dolph
View on GitHub
RUDOLPH: One Hyper-Tasking Transformer can be creative as DALL-E and GPT-3 and smart as CLIP
☆254Feb 6, 2023Updated 3 years ago
lucidrains / x-clip
View on GitHub
A concise but complete implementation of CLIP with various experimental improvements from recent papers
☆724Oct 16, 2023Updated 2 years ago
robvanvolt / DALLE-datasets
View on GitHub
This is a summary of easily available datasets for generalized DALLE-pytorch training.
☆130Apr 19, 2022Updated 4 years ago
rvencu / crawlingathome-gpu-hcloud
View on GitHub
GPU controlled Hetzner Cloud workers swarm for Crawling@Home project
☆58Oct 9, 2022Updated 3 years ago
kakaobrain / coyo-dataset
View on GitHub
COYO-700M: Large-scale Image-Text Pair Dataset
☆1,256Nov 30, 2022Updated 3 years ago
tgisaturday / dalle-lightning
View on GitHub
Refactoring dalle-pytorch and taming-transformers for TPU VM
☆60Aug 30, 2021Updated 4 years ago
Newbeeer / stf
View on GitHub
Code for ICLR 2023 Paper, "Stable Target Field for Reduced Variance Score Estimation in Diffusion Models”
☆76Jun 6, 2023Updated 3 years ago
LAION-AI / Big-Interleaved-Dataset
View on GitHub
Big-Interleaved-Dataset
☆59Jan 21, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
SHI-Labs / Versatile-Diffusion
View on GitHub
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model, arXiv 2022 / ICCV 2023
☆1,334Aug 10, 2023Updated 2 years ago
FacePerceiver / LAION-Face
View on GitHub
The human face subset of LAION-400M for large-scale face pretraining.
☆320Feb 1, 2023Updated 3 years ago
dzryk / clip-grams
View on GitHub
☆30Nov 25, 2021Updated 4 years ago
lucidrains / CLAP
View on GitHub
Contrastive Language-Audio Pretraining
☆15May 18, 2021Updated 5 years ago
openai / vdvae
View on GitHub
Repository for the paper "Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images"
☆453Apr 28, 2023Updated 3 years ago
Zasder3 / train-CLIP-FT
View on GitHub
☆49Aug 2, 2021Updated 4 years ago
eliohead / glide-finetune-colab
View on GitHub
Colab notebook to finetune GLIDE.
☆12Mar 22, 2022Updated 4 years ago