facebookresearch / spdlLinks
Scalable and Performant Data Loading
☆360Updated this week
Alternatives and similar repositories for spdl
Users that are interested in spdl are comparing it to the libraries listed below
Sorting:
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆465Updated last week
- For optimization algorithm research and development.☆556Updated 2 weeks ago
- Speed up model training by fixing data loading.☆566Updated 3 weeks ago
- Load compute kernels from the Hub☆357Updated 3 weeks ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆590Updated 4 months ago
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆277Updated last month
- Helpful tools and examples for working with flex-attention☆1,096Updated last week
- This repository contains the experimental PyTorch native float8 training UX☆227Updated last year
- ☆569Updated 3 months ago
- Efficient optimizers☆280Updated 2 weeks ago
- ☆314Updated last year
- A tool to configure, launch and manage your machine learning experiments.☆212Updated this week
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆329Updated 2 months ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆549Updated 7 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorch☆97Updated 5 months ago
- Dion optimizer algorithm☆413Updated this week
- ☆225Updated last month
- A library that contains a rich collection of performant PyTorch model metrics, a simple interface to create new metrics, a toolkit to fac…☆245Updated 2 weeks ago
- A library for unit scaling in PyTorch☆133Updated 5 months ago
- ☆341Updated this week
- Best practices & guides on how to write distributed pytorch training code☆557Updated 2 months ago
- ☆304Updated 8 months ago
- TensorDict is a pytorch dedicated tensor container.☆998Updated this week
- FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores☆338Updated last year
- ☆178Updated last year
- Universal Notation for Tensor Operations in Python.☆456Updated 8 months ago
- Annotated version of the Mamba paper☆493Updated last year
- torchprime is a reference model implementation for PyTorch on TPU.☆43Updated 2 months ago
- Implementation of a Transformer, but completely in Triton☆277Updated 3 years ago
- Where GPUs get cooked 👩🍳🔥☆345Updated 3 months ago