meta-pytorch / torchftLinks
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
β410Updated this week
Alternatives and similar repositories for torchft
Users that are interested in torchft are comparing it to the libraries listed below
Sorting:
- PyTorch Single Controllerβ425Updated this week
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β268Updated 2 months ago
- Scalable and Performant Data Loadingβ304Updated last week
- Load compute kernels from the Hubβ290Updated last week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.β576Updated last month
- A Quirky Assortment of CuTe Kernelsβ582Updated last week
- This repository contains the experimental PyTorch native float8 training UXβ224Updated last year
- Fast low-bit matmul kernels in Tritonβ373Updated last week
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β213Updated this week
- LLM KV cache compression made easyβ623Updated this week
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.β280Updated last month
- A library to analyze PyTorch traces.β414Updated last week
- β221Updated 7 months ago
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.β318Updated this week
- Applied AI experiments and examples for PyTorchβ296Updated last month
- β240Updated this week
- A tool to configure, launch and manage your machine learning experiments.β193Updated last week
- ring-attention experimentsβ152Updated 11 months ago
- Cataloging released Triton kernels.β261Updated 3 weeks ago
- β314Updated last year
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problemsβ581Updated 2 weeks ago
- β173Updated last year
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.β142Updated last year
- β217Updated 8 months ago
- Learn CUDA with PyTorchβ84Updated last week
- kernels, of the mega varietyβ502Updated last week
- Efficient LLM Inference over Long Sequencesβ391Updated 3 months ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASSβ226Updated 4 months ago
- β331Updated 3 weeks ago
- Module, Model, and Tensor Serialization/Deserializationβ268Updated last month