pytorch / torchftLinks
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
β366Updated last week
Alternatives and similar repositories for torchft
Users that are interested in torchft are comparing it to the libraries listed below
Sorting:
- PyTorch Single Controllerβ341Updated this week
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β258Updated last week
- Scalable and Performant Data Loadingβ290Updated last week
- Load compute kernels from the Hubβ214Updated last week
- LLM KV cache compression made easyβ560Updated this week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.β563Updated 2 weeks ago
- β313Updated 11 months ago
- This repository contains the experimental PyTorch native float8 training UXβ224Updated last year
- Fast low-bit matmul kernels in Tritonβ338Updated this week
- Applied AI experiments and examples for PyTorchβ289Updated 2 months ago
- A library to analyze PyTorch traces.β398Updated last week
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β207Updated last week
- β227Updated this week
- β214Updated 6 months ago
- A Quirky Assortment of CuTe Kernelsβ374Updated this week
- β162Updated last year
- Efficient LLM Inference over Long Sequencesβ385Updated last month
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASSβ203Updated 2 months ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problemsβ493Updated last week
- Where GPUs get cooked π©βπ³π₯β237Updated this week
- ring-attention experimentsβ145Updated 9 months ago
- β323Updated last month
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ188Updated 2 months ago
- A tool to configure, launch and manage your machine learning experiments.β174Updated this week
- β203Updated 5 months ago
- Cataloging released Triton kernels.β246Updated 6 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.β137Updated last year
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the β¦β194Updated last week
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welβ¦β364Updated last month
- TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and supβ¦β378Updated this week