foundation-model-stack / fms-fsdpLinks
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2.
☆249Updated this week
Alternatives and similar repositories for fms-fsdp
Users that are interested in fms-fsdp are comparing it to the libraries listed below
Sorting:
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆196Updated this week
- Triton-based implementation of Sparse Mixture of Experts.☆216Updated 6 months ago
- Scalable toolkit for efficient model reinforcement☆361Updated this week
- Applied AI experiments and examples for PyTorch☆271Updated this week
- This repository contains the experimental PyTorch native float8 training UX☆222Updated 10 months ago
- ☆188Updated 3 months ago
- Ring attention implementation with flash attention☆771Updated last week
- Large Context Attention☆711Updated 4 months ago
- ring-attention experiments☆143Updated 7 months ago
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆208Updated 9 months ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆514Updated 2 weeks ago
- Fast low-bit matmul kernels in Triton☆303Updated last week
- LLM KV cache compression made easy☆493Updated 3 weeks ago
- Load compute kernels from the Hub☆139Updated this week
- PyTorch per step fault tolerance (actively under development)☆302Updated last week
- Explorations into some recent techniques surrounding speculative decoding☆266Updated 5 months ago
- ☆108Updated last year
- Megatron's multi-modal data loader☆204Updated last week
- kernels, of the mega variety☆184Updated this week
- Zero Bubble Pipeline Parallelism☆395Updated 3 weeks ago
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training☆203Updated 2 weeks ago
- Cataloging released Triton kernels.☆226Updated 4 months ago
- 🔥 A minimal training framework for scaling FLA models☆146Updated 3 weeks ago
- Collection of kernels written in Triton language☆125Updated last month
- ☆193Updated 3 weeks ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆237Updated 4 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆544Updated this week
- Normalized Transformer (nGPT)☆181Updated 6 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆122Updated 5 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆117Updated this week