pytorch / torchftLinks
PyTorch per step fault tolerance (actively under development)
β302Updated this week
Alternatives and similar repositories for torchft
Users that are interested in torchft are comparing it to the libraries listed below
Sorting:
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β249Updated this week
- This repository contains the experimental PyTorch native float8 training UXβ222Updated 10 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.β544Updated this week
- Scalable and Performant Data Loadingβ269Updated this week
- LLM KV cache compression made easyβ493Updated 3 weeks ago
- Applied AI experiments and examples for PyTorchβ271Updated this week
- Fast low-bit matmul kernels in Tritonβ303Updated last week
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASSβ181Updated 3 weeks ago
- β188Updated 3 months ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problemsβ351Updated 3 weeks ago
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β196Updated this week
- β210Updated 4 months ago
- kernels, of the mega varietyβ184Updated this week
- Cataloging released Triton kernels.β226Updated 4 months ago
- Scalable toolkit for efficient model reinforcementβ361Updated this week
- A library to analyze PyTorch traces.β379Updated this week
- Collection of kernels written in Triton languageβ125Updated last month
- β210Updated this week
- Load compute kernels from the Hubβ139Updated this week
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the β¦β169Updated this week
- Perplexity GPU Kernelsβ324Updated last week
- Helpful tools and examples for working with flex-attentionβ802Updated last week
- ring-attention experimentsβ143Updated 7 months ago
- β156Updated last year
- Efficient LLM Inference over Long Sequencesβ376Updated this week
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ184Updated this week
- β308Updated 9 months ago
- extensible collectives library in tritonβ87Updated 2 months ago
- Triton-based implementation of Sparse Mixture of Experts.β216Updated 6 months ago
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM trainingβ45Updated last week