Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
☆478Feb 3, 2026Updated 3 weeks ago
Alternatives and similar repositories for torchft
Users that are interested in torchft are comparing it to the libraries listed below
Sorting:
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆264Updated this week
- A PyTorch native platform for training generative AI models☆5,098Updated this week
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆280Nov 24, 2025Updated 3 months ago
- ☆21Mar 3, 2025Updated 11 months ago
- PyTorch native quantization and sparsity for training and inference☆2,707Updated this week
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆164Jan 12, 2026Updated last month
- Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs☆938Nov 27, 2025Updated 3 months ago
- A library to analyze PyTorch traces.☆467Feb 4, 2026Updated 3 weeks ago
- Minimalistic large language model 3D-parallelism training☆2,579Feb 19, 2026Updated last week
- Helpful tools and examples for working with flex-attention☆1,136Feb 8, 2026Updated 3 weeks ago
- PyTorch Single Controller☆981Updated this week
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆327Updated this week
- ☆124May 28, 2024Updated last year
- TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and sup…☆417Updated this week
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆143Sep 12, 2025Updated 5 months ago
- extensible collectives library in triton☆95Mar 31, 2025Updated 11 months ago
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…☆3,176Updated this week
- Fast low-bit matmul kernels in Triton☆433Feb 1, 2026Updated last month
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆595Aug 12, 2025Updated 6 months ago
- This repository contains the experimental PyTorch native float8 training UX☆226Aug 1, 2024Updated last year
- A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.☆922Updated this week
- See https://github.com/cuda-mode/triton-index/ instead!☆11May 8, 2024Updated last year
- Distributed Compiler based on Triton for Parallel Systems☆1,371Feb 13, 2026Updated 2 weeks ago
- CUDA checkpoint and restore utility☆424Sep 15, 2025Updated 5 months ago
- Torch Distributed Experimental☆117Aug 5, 2024Updated last year
- Ring attention implementation with flash attention☆986Sep 10, 2025Updated 5 months ago
- TorchFix - a linter for PyTorch-using code with autofix support☆152Aug 23, 2025Updated 6 months ago
- Scalable and Performant Data Loading☆368Updated this week
- Zero Bubble Pipeline Parallelism☆451May 7, 2025Updated 9 months ago
- Pipeline Parallelism for PyTorch☆785Aug 21, 2024Updated last year
- An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.☆52Updated this week
- Framework to reduce autotune overhead to zero for well known deployments.☆97Sep 19, 2025Updated 5 months ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆106Jun 28, 2025Updated 8 months ago
- Efficient Triton Kernels for LLM Training☆6,162Updated this week
- TORCH_TRACE parser for PT2☆78Updated this week
- ☆323Aug 20, 2024Updated last year
- Tile primitives for speedy kernels☆3,183Updated this week
- ☆580Sep 23, 2025Updated 5 months ago