A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
☆3,004Feb 9, 2026Updated 3 weeks ago
Alternatives and similar repositories for webdataset
Users that are interested in webdataset are comparing it to the libraries listed below
Sorting:
- Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.☆4,371Oct 19, 2025Updated 4 months ago
- FFCV: Fast Forward Computer Vision (and other ML workloads!)☆2,985Jun 16, 2024Updated last year
- PyTorch extensions for high performance and large scale training.☆3,400Apr 26, 2025Updated 10 months ago
- Fast and simple stream processing of files in tar files, useful for deep learning, big data, and many other applications.☆135Dec 10, 2023Updated 2 years ago
- Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)☆9,415Feb 20, 2026Updated last week
- A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.☆1,249Updated this week
- An open source implementation of CLIP.☆13,430Updated this week
- 🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (i…☆9,513Feb 26, 2026Updated last week
- A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep lear…☆5,637Updated this week
- AIStore: scalable storage for AI applications☆1,771Updated this week
- A small demonstration of using WebDataset with ImageNet and PyTorch Lightning☆76Dec 19, 2023Updated 2 years ago
- A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch☆8,926Feb 24, 2026Updated last week
- VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.☆3,295Mar 3, 2024Updated 2 years ago
- Hackable and optimized Transformers building blocks, supporting a composable construction.☆10,353Feb 20, 2026Updated last week
- 🐍 Geometric Computer Vision Library for Spatial AI☆11,093Feb 23, 2026Updated last week
- Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.☆30,884Updated this week
- The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights --…☆36,420Feb 26, 2026Updated last week
- Fast and memory-efficient exact attention☆22,460Updated this week
- A PyTorch native platform for training generative AI models☆5,098Updated this week
- COYO-700M: Large-scale Image-Text Pair Dataset☆1,252Nov 30, 2022Updated 3 years ago
- ☆107Apr 15, 2021Updated 4 years ago
- Python 3.8+ toolbox for submitting jobs to Slurm☆1,585Jan 14, 2026Updated last month
- DataComp: In search of the next generation of multimodal datasets☆772Apr 28, 2025Updated 10 months ago
- Hydra is a framework for elegantly configuring complex applications☆10,231Feb 7, 2026Updated 3 weeks ago
- Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"☆8,382May 31, 2024Updated last year
- Machine learning metrics for distributed, scalable PyTorch applications.☆2,410Feb 26, 2026Updated last week
- Accessible large language models via k-bit quantization for PyTorch.☆7,997Feb 26, 2026Updated last week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.☆41,706Updated this week
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities☆22,030Jan 23, 2026Updated last month
- LAVIS - A One-stop Library for Language-Vision Intelligence☆11,177Nov 18, 2024Updated last year
- Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.☆3,371May 19, 2025Updated 9 months ago
- Official DeiT repository☆4,325Mar 15, 2024Updated last year
- 🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.☆32,873Feb 26, 2026Updated last week
- NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024☆1,815Nov 27, 2025Updated 3 months ago
- Easily compute clip embeddings and build a clip retrieval system with them☆2,732Aug 15, 2025Updated 6 months ago
- Ongoing research training transformer models at scale☆15,461Updated this week
- An efficient video loader for deep learning with smart shuffling that's super easy to digest☆2,427Jul 17, 2024Updated last year
- Flexible Python configuration system. The last one you will ever need.☆2,348Nov 29, 2025Updated 3 months ago
- CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image☆32,642Feb 18, 2026Updated 2 weeks ago