ethansmith2000 / fsdp_optimizersLinks

supporting pytorch FSDP for optimizers

☆83

Alternatives and similar repositories for fsdp_optimizers

Users that are interested in fsdp_optimizers are comparing it to the libraries listed below

Sorting:

cloneofsimo / min-fsdp
☆91Updated last year
HomebrewML / HeavyBall
Efficient optimizers
☆274Updated last week
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆166Updated 3 months ago
thinking-machines-lab / manifolds
Supporting code for the blog post on modular manifolds.
☆86Updated 3 weeks ago
cloneofsimo / scaling-guide
WIP
☆93Updated last year
cloneofsimo / ezmup
Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam
☆85Updated last year
dvruette / barrel-rec-pytorch
☆53Updated last year
nikhilvyas / SOAP
☆217Updated 10 months ago
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆130Updated 3 months ago
proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆189Updated last year
fal-ai-community / NativeSparseAttention
research impl of Native Sparse Attention (2502.11089)
☆62Updated 8 months ago
bloc97 / DeMo
DeMo: Decoupled Momentum Optimization
☆194Updated 10 months ago
shikaiqiu / compute-better-spent
☆58Updated last year
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆131Updated 10 months ago
modula-systems / modula
🧱 Modula software package
☆287Updated 2 months ago
cloneofsimo / minSAE
☆30Updated 10 months ago
martin-marek / batch-size
📄Small Batch Size Training for Language Models
☆63Updated 2 weeks ago
warner-benjamin / optimi
Fast, Modern, and Low Precision PyTorch Optimizers
☆116Updated last month
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆233Updated 3 weeks ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
fal-ai-community / nano-mdm
Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun
☆56Updated 7 months ago
MatX-inc / seqax
seqax = sequence modeling + JAX
☆167Updated 3 months ago
apple / ml-ademamix
☆67Updated 11 months ago
epfml / llm-baselines
nanoGPT-like codebase for LLM training
☆107Updated 5 months ago
mgmalek / efficient_cross_entropy
☆121Updated last year
cloneofsimo / zeroshampoo
☆34Updated last year
fal-ai / diffusion-speedrun
Focused on fast experimentation and simplicity
☆75Updated 9 months ago
evanatyourservice / kron_torch
An implementation of PSGD Kron second-order optimizer for PyTorch
☆96Updated 2 months ago
berlino / seq_icl
☆53Updated last year