foundation-model-stack / fms-fsdpLinks
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2.
☆271Updated last week
Alternatives and similar repositories for fms-fsdp
Users that are interested in fms-fsdp are comparing it to the libraries listed below
Sorting:
- This repository contains the experimental PyTorch native float8 training UX☆223Updated last year
- Triton-based implementation of Sparse Mixture of Experts.☆248Updated last month
- Load compute kernels from the Hub☆326Updated this week
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆216Updated last week
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆543Updated 5 months ago
- Applied AI experiments and examples for PyTorch☆303Updated 2 months ago
- ring-attention experiments☆155Updated last year
- ☆225Updated 3 weeks ago
- Large Context Attention☆748Updated last month
- Fast low-bit matmul kernels in Triton☆392Updated 2 weeks ago
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆217Updated last year
- ☆121Updated last year
- Megatron's multi-modal data loader☆260Updated last week
- Efficient LLM Inference over Long Sequences☆390Updated 4 months ago
- Training library for Megatron-based models☆174Updated this week
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆446Updated this week
- ☆246Updated last week
- A Quirky Assortment of CuTe Kernels☆651Updated 2 weeks ago
- Ring attention implementation with flash attention☆906Updated 2 months ago
- Cataloging released Triton kernels.☆265Updated 2 months ago
- ☆545Updated last month
- ☆205Updated 6 months ago
- 🔥 A minimal training framework for scaling FLA models☆291Updated 2 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆582Updated 3 months ago
- Explorations into some recent techniques surrounding speculative decoding☆290Updated 10 months ago
- LLM KV cache compression made easy☆680Updated this week
- ☆130Updated 5 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆206Updated 4 months ago
- Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.☆125Updated this week
- Collection of kernels written in Triton language☆164Updated 7 months ago