pytorch / torchftLinks
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
β359Updated last week
Alternatives and similar repositories for torchft
Users that are interested in torchft are comparing it to the libraries listed below
Sorting:
- PyTorch Single Controllerβ296Updated this week
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β255Updated this week
- Scalable and Performant Data Loadingβ287Updated this week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.β561Updated 3 weeks ago
- This repository contains the experimental PyTorch native float8 training UXβ224Updated 11 months ago
- Load compute kernels from the Hubβ203Updated this week
- LLM KV cache compression made easyβ535Updated this week
- β214Updated 5 months ago
- Applied AI experiments and examples for PyTorchβ281Updated last month
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β205Updated this week
- β160Updated last year
- β225Updated this week
- Fast low-bit matmul kernels in Tritonβ327Updated this week
- A library to analyze PyTorch traces.β391Updated this week
- β198Updated 5 months ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problemsβ468Updated this week
- ring-attention experimentsβ144Updated 8 months ago
- β310Updated 10 months ago
- Cataloging released Triton kernels.β242Updated 6 months ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASSβ195Updated 2 months ago
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welβ¦β354Updated last month
- Scalable toolkit for efficient model reinforcementβ478Updated this week
- Implementation of π Ring Attention, from Liu et al. at Berkeley AI, in Pytorchβ529Updated last month
- Pipeline Parallelism for PyTorchβ769Updated 10 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.β134Updated last year
- β320Updated 2 weeks ago
- Efficient LLM Inference over Long Sequencesβ382Updated 2 weeks ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ188Updated last month
- A tool to configure, launch and manage your machine learning experiments.β169Updated this week
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUsβ424Updated this week