pytorch / torchft
PyTorch per step fault tolerance (actively under development)
β243Updated this week
Alternatives and similar repositories for torchft:
Users that are interested in torchft are comparing it to the libraries listed below
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β221Updated last week
- This repository contains the experimental PyTorch native float8 training UXβ221Updated 6 months ago
- LLM KV cache compression made easyβ394Updated this week
- Applied AI experiments and examples for PyTorchβ224Updated this week
- β197Updated 3 weeks ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.β514Updated this week
- Scalable and Performant Data Loadingβ217Updated last week
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problemsβ172Updated last week
- Cataloging released Triton kernels.β167Updated last month
- Fast low-bit matmul kernels in Tritonβ232Updated this week
- β177Updated this week
- Fast Matrix Multiplications for Lookup Table-Quantized LLMsβ229Updated this week
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β187Updated this week
- β133Updated this week
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mindβ¦β154Updated 2 months ago
- ring-attention experimentsβ123Updated 4 months ago
- A safetensors extension to efficiently store sparse quantized tensors on diskβ75Updated this week
- Google TPU optimizations for transformers modelsβ98Updated 3 weeks ago
- extensible collectives library in tritonβ82Updated 4 months ago
- Minimalistic 4D-parallelism distributed training framework for education purposeβ722Updated last week
- Efficient LLM Inference over Long Sequencesβ357Updated this week
- Helpful tools and examples for working with flex-attentionβ631Updated this week
- A library for unit scaling in PyTorchβ122Updated 2 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.β117Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMsβ259Updated 4 months ago
- seqax = sequence modeling + JAXβ143Updated 7 months ago
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welβ¦β274Updated this week
- Megatron's multi-modal data loaderβ162Updated this week
- A library to analyze PyTorch traces.β332Updated this week
- β284Updated this week