ethansmith2000 / fsdp_optimizersLinks
supporting pytorch FSDP for optimizers
☆79Updated 5 months ago
Alternatives and similar repositories for fsdp_optimizers
Users that are interested in fsdp_optimizers are comparing it to the libraries listed below
Sorting:
- ☆78Updated 10 months ago
- WIP☆93Updated 9 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆126Updated 3 weeks ago
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam☆77Updated 10 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆127Updated last year
- Efficient optimizers☆206Updated this week
- ☆53Updated last year
- Fast, Modern, Memory Efficient, and Low Precision PyTorch Optimizers☆93Updated 10 months ago
- Experiment of using Tangent to autodiff triton☆78Updated last year
- Focused on fast experimentation and simplicity☆73Updated 5 months ago
- 🧱 Modula software package☆194Updated 2 months ago
- research impl of Native Sparse Attention (2502.11089)☆54Updated 3 months ago
- ☆182Updated 5 months ago
- Minimal but scalable implementation of large language models in JAX☆34Updated 7 months ago
- seqax = sequence modeling + JAX☆155Updated last month
- Understand and test language model architectures on synthetic tasks.☆195Updated 2 months ago
- ☆33Updated 8 months ago
- ☆28Updated 6 months ago
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆52Updated 2 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆116Updated 5 months ago
- LoRA for arbitrary JAX models and functions☆135Updated last year
- ☆108Updated last year
- ☆80Updated last year
- A set of Python scripts that makes your experience on TPU better☆53Updated 10 months ago
- ☆20Updated last year
- NanoGPT-speedrunning for the poor T4 enjoyers☆66Updated last month
- Load compute kernels from the Hub☆139Updated this week
- JAX bindings for Flash Attention v2☆88Updated 10 months ago
- ☆31Updated last month
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆100Updated 5 months ago