pytorch / torchft
PyTorch per step fault tolerance (actively under development)
β209Updated this week
Alternatives and similar repositories for torchft:
Users that are interested in torchft are comparing it to the libraries listed below
- Scalable and Performant Data Loadingβ201Updated this week
- This repository contains the experimental PyTorch native float8 training UXβ217Updated 5 months ago
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β213Updated this week
- β184Updated 3 weeks ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.β505Updated 2 months ago
- LLM KV cache compression made easyβ289Updated this week
- Minimalistic 4D-parallelism distributed training framework for education purposeβ544Updated 3 weeks ago
- Cataloging released Triton kernels.β147Updated this week
- Applied AI experiments and examples for PyTorchβ208Updated 3 weeks ago
- β82Updated this week
- β169Updated this week
- β138Updated 11 months ago
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.β136Updated this week
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsβ¦β254Updated 3 weeks ago
- Fast Matrix Multiplications for Lookup Table-Quantized LLMsβ214Updated this week
- β91Updated this week
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β256Updated this week
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β180Updated this week
- β75Updated 6 months ago
- ring-attention experimentsβ113Updated 2 months ago
- Best practices & guides on how to write distributed pytorch training codeβ329Updated 3 weeks ago
- Efficient LLM Inference over Long Sequencesβ336Updated last week
- Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overheadβ207Updated this week
- TorchFix - a linter for PyTorch-using code with autofix supportβ116Updated this week
- Transform datasets at scale. Optimize datasets for fast AI model training.β393Updated this week
- Fast low-bit matmul kernels in Tritonβ185Updated this week
- Megatron's multi-modal data loaderβ155Updated last week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.β113Updated last year
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"β154Updated 2 months ago
- β85Updated 10 months ago