foundation-model-stack / fms-fsdpLinks
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2.
☆270Updated 3 months ago
Alternatives and similar repositories for fms-fsdp
Users that are interested in fms-fsdp are comparing it to the libraries listed below
Sorting:
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆215Updated last week
- Load compute kernels from the Hub☆304Updated last week
- Triton-based implementation of Sparse Mixture of Experts.☆246Updated 3 weeks ago
- This repository contains the experimental PyTorch native float8 training UX☆223Updated last year
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆542Updated 5 months ago
- Large Context Attention☆743Updated last week
- Applied AI experiments and examples for PyTorch☆299Updated 2 months ago
- ☆222Updated 3 weeks ago
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆420Updated last week
- ring-attention experiments☆154Updated last year
- Fast low-bit matmul kernels in Triton☆381Updated 3 weeks ago
- Efficient LLM Inference over Long Sequences☆390Updated 3 months ago
- LLM KV cache compression made easy☆660Updated last week
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆216Updated last year
- 🔥 A minimal training framework for scaling FLA models☆266Updated last month
- Cataloging released Triton kernels.☆263Updated last month
- ☆534Updated last month
- ☆121Updated last year
- Explorations into some recent techniques surrounding speculative decoding☆288Updated 10 months ago
- Megatron's multi-modal data loader☆252Updated last week
- Microsoft Automatic Mixed Precision Library☆626Updated last year
- Training library for Megatron-based models☆125Updated this week
- A Quirky Assortment of CuTe Kernels☆627Updated last week
- Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.☆111Updated last week
- ☆205Updated 5 months ago
- ☆130Updated 4 months ago
- Collection of kernels written in Triton language☆157Updated 6 months ago
- ☆240Updated this week
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆195Updated 4 months ago
- Triton implementation of FlashAttention2 that adds Custom Masks.☆141Updated last year