pytorch / torchftLinks
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
β383Updated last week
Alternatives and similar repositories for torchft
Users that are interested in torchft are comparing it to the libraries listed below
Sorting:
- PyTorch Single Controllerβ361Updated last week
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β260Updated 3 weeks ago
- Scalable and Performant Data Loadingβ291Updated last week
- Load compute kernels from the Hubβ244Updated this week
- LLM KV cache compression made easyβ586Updated this week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.β568Updated last week
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β208Updated last week
- This repository contains the experimental PyTorch native float8 training UXβ224Updated last year
- Applied AI experiments and examples for PyTorchβ290Updated 2 months ago
- Fast low-bit matmul kernels in Tritonβ349Updated this week
- ring-attention experimentsβ149Updated 10 months ago
- β211Updated 6 months ago
- A library to analyze PyTorch traces.β404Updated last week
- β162Updated last year
- β314Updated last year
- A Quirky Assortment of CuTe Kernelsβ407Updated this week
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ190Updated 2 months ago
- A tool to configure, launch and manage your machine learning experiments.β182Updated this week
- β232Updated this week
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.β250Updated 2 weeks ago
- Efficient LLM Inference over Long Sequencesβ389Updated last month
- Where GPUs get cooked π©βπ³π₯β277Updated 2 weeks ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problemsβ527Updated this week
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the β¦β206Updated this week
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASSβ211Updated 3 months ago
- β217Updated 7 months ago
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.β260Updated this week
- Cataloging released Triton kernels.β252Updated 7 months ago
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welβ¦β369Updated 2 months ago
- kernels, of the mega varietyβ472Updated 2 months ago