pytorch / torchft
PyTorch per step fault tolerance (actively under development)
β291Updated this week
Alternatives and similar repositories for torchft:
Users that are interested in torchft are comparing it to the libraries listed below
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β244Updated this week
- This repository contains the experimental PyTorch native float8 training UXβ224Updated 9 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.β536Updated this week
- Scalable and Performant Data Loadingβ247Updated this week
- Applied AI experiments and examples for PyTorchβ262Updated last week
- LLM KV cache compression made easyβ471Updated this week
- Fast low-bit matmul kernels in Tritonβ295Updated this week
- Perplexity GPU Kernelsβ272Updated this week
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASSβ169Updated last month
- Efficient LLM Inference over Long Sequencesβ372Updated this week
- β181Updated 2 months ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problemsβ288Updated last week
- β209Updated 3 months ago
- A library to analyze PyTorch traces.β367Updated last week
- Cataloging released Triton kernels.β220Updated 3 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ180Updated last week
- β304Updated 8 months ago
- β202Updated last week
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β194Updated this week
- A tool to configure, launch and manage your machine learning experiments.β144Updated this week
- Load compute kernels from the Hubβ115Updated last week
- ring-attention experimentsβ132Updated 6 months ago
- β155Updated last year
- Collection of kernels written in Triton languageβ120Updated last month
- Helpful tools and examples for working with flex-attentionβ746Updated 3 weeks ago
- Where GPUs get cooked π©βπ³π₯β226Updated 2 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ262Updated 6 months ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mindβ¦β157Updated 4 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β291Updated this week
- Transform datasets at scale. Optimize datasets for fast AI model training.β466Updated this week