cloneofsimo/min-fsdp

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/cloneofsimo/min-fsdp)

cloneofsimo / min-fsdp

☆93

Alternatives and similar repositories for min-fsdp

Users that are interested in min-fsdp are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

cloneofsimo / zeroshampoo
View on GitHub
☆33Sep 10, 2024Updated last year
cloneofsimo / min-max-gpt
View on GitHub
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Apr 17, 2024Updated 2 years ago
ethansmith2000 / fsdp_optimizers
View on GitHub
supporting pytorch FSDP for optimizers
☆84Dec 8, 2024Updated last year
cloneofsimo / scaling-guide
View on GitHub
WIP
☆96Aug 13, 2024Updated last year
srush / tangent
View on GitHub
Source-to-Source Debuggable Derivatives in Pure Python
☆15Jan 23, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
MatX-inc / seqax
View on GitHub
seqax = sequence modeling + JAX
☆195Jul 23, 2025Updated 11 months ago
ethansmith2000 / TransformerExperiments
View on GitHub
☆19Dec 4, 2025Updated 7 months ago
JonasGeiping / linear_cross_entropy_loss
View on GitHub
A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.
☆75Aug 2, 2024Updated last year
cloneofsimo / minRF
View on GitHub
Minimal implementation of scalable rectified flow transformers, based on SD3's approach
☆640Jul 1, 2024Updated 2 years ago
cloneofsimo / repa-rf
View on GitHub
☆32Nov 4, 2024Updated last year
HazyResearch / train-tk
View on GitHub
train with kittens!
☆66Oct 25, 2024Updated last year
Birch-san / booru-embed
View on GitHub
[WIP] Transformer to embed Danbooru labelsets
☆13Mar 31, 2024Updated 2 years ago
cloneofsimo / ezmup
View on GitHub
Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam
☆88Jul 28, 2024Updated last year
cloneofsimo / efae
View on GitHub
☆24Jun 18, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
siboehm / ShallowSpeed
View on GitHub
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
☆165Oct 19, 2023Updated 2 years ago
foundation-model-stack / fms-fsdp
View on GitHub
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆288Nov 24, 2025Updated 7 months ago
crowsonkb / torch-dist-utils
View on GitHub
Utilities for PyTorch distributed
☆26Feb 27, 2025Updated last year
srush / mamba-scans
View on GitHub
Blog post
☆17Feb 16, 2024Updated 2 years ago
cloneofsimo / minSAE
View on GitHub
☆30Dec 2, 2024Updated last year
mnoukhov / async_rlhf
View on GitHub
Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models
☆68Mar 5, 2026Updated 4 months ago
GindaChen / FlexFlashAttention3
View on GitHub
FlexAttention w/ FlashAttention3 Support
☆27Oct 5, 2024Updated last year
EleutherAI / rnngineering
View on GitHub
Engineering the state of RNN language models (Mamba, RWKV, etc.)
☆33May 25, 2024Updated 2 years ago
MekkCyber / TritonAcademy
View on GitHub
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆208Jun 1, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
Doraemonzzz / nanoTransNormer
View on GitHub
☆11Oct 11, 2023Updated 2 years ago
ermongroup / fast_feedforward_computation
View on GitHub
Official code for "Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving", ICML 2021
☆30Sep 25, 2021Updated 4 years ago
proger / accelerated-scan
View on GitHub
Accelerated First Order Parallel Associative Scan
☆198Jan 7, 2026Updated 6 months ago
callummcdougall / sae-exercises-mats
View on GitHub
☆26Dec 20, 2023Updated 2 years ago
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆19Nov 19, 2024Updated last year
mgmalek / efficient_cross_entropy
View on GitHub
☆124May 28, 2024Updated 2 years ago
UmerHA / triton_util
View on GitHub
Make triton easier
☆49Jun 12, 2024Updated 2 years ago
samblouir / birdie
View on GitHub
☆15Jun 8, 2026Updated last month
glassroom / heinsen_attention
View on GitHub
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆25Jun 6, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
TRI-ML / linear_open_lm
View on GitHub
A repository for research on medium sized language models.
☆78May 23, 2024Updated 2 years ago
OpenNLPLab / ETSC-Exact-Toeplitz-to-SSM-Conversion
View on GitHub
[EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…
☆14Oct 17, 2023Updated 2 years ago
AlirezaMorsali / MLP-Attention
View on GitHub
☆17Dec 19, 2024Updated last year
google-deepmind / nanodo
View on GitHub
☆304Jul 15, 2024Updated 2 years ago
jemisjoky / umps_code
View on GitHub
u-MPS implementation and experimentation code used in the paper Tensor Networks for Probabilistic Sequence Modeling (https://arxiv.org/ab…
☆19Jul 2, 2020Updated 6 years ago
test-time-training / ttt-tk
View on GitHub
☆45Nov 1, 2025Updated 8 months ago
cloneofsimo / karras-power-ema-tutorial
View on GitHub
☆53Jan 6, 2024Updated 2 years ago