meta-pytorch / torchftLinks
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
β475Updated last week
Alternatives and similar repositories for torchft
Users that are interested in torchft are comparing it to the libraries listed below
Sorting:
- PyTorch Single Controllerβ957Updated this week
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β280Updated 2 months ago
- Scalable and Performant Data Loadingβ364Updated last week
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.β739Updated this week
- Load compute kernels from the Hubβ397Updated this week
- A library to analyze PyTorch traces.β462Updated this week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.β595Updated 5 months ago
- Applied AI experiments and examples for PyTorchβ315Updated 5 months ago
- Where GPUs get cooked π©βπ³π₯β363Updated 2 weeks ago
- A Quirky Assortment of CuTe Kernelsβ781Updated this week
- Fast low-bit matmul kernels in Tritonβ427Updated last week
- This repository contains the experimental PyTorch native float8 training UXβ227Updated last year
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β219Updated last week
- LLM KV cache compression made easyβ876Updated last week
- β322Updated last year
- kernels, of the mega varietyβ665Updated last week
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welβ¦β404Updated last month
- β178Updated 2 years ago
- TPU inference for vLLM, with unified JAX and PyTorch support.β228Updated this week
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.β334Updated 3 months ago
- ring-attention experimentsβ165Updated last year
- β286Updated last week
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the β¦β262Updated this week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.β155Updated 2 years ago
- β232Updated 2 months ago
- torchcomms: a modern PyTorch communications APIβ327Updated this week
- Cataloging released Triton kernels.β292Updated 5 months ago
- Pipeline Parallelism for PyTorchβ784Updated last year
- β345Updated last week
- TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and supβ¦β412Updated this week