evanatyourservice / psgd_jax
Implementation of PSGD optimizer in JAX
☆19Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for psgd_jax
- ☆16Updated 3 months ago
- ☆53Updated 10 months ago
- Automatically take good care of your preemptible TPUs☆32Updated last year
- ☆129Updated last week
- ☆73Updated 4 months ago
- Scalable neural net training via automatic normalization in the modular norm.☆122Updated this week
- Efficient optimizers☆87Updated this week
- seqax = sequence modeling + JAX☆134Updated 4 months ago
- Pytorch implementation of preconditioned stochastic gradient descent (affine group preconditioner, low-rank approximation preconditioner …☆128Updated last month
- LoRA for arbitrary JAX models and functions☆133Updated 8 months ago
- A simple library for scaling up JAX programs☆127Updated 3 weeks ago
- ☆48Updated last week
- WIP☆89Updated 3 months ago
- Experiment of using Tangent to autodiff triton☆72Updated 10 months ago
- Accelerated First Order Parallel Associative Scan☆164Updated 3 months ago
- ☆50Updated 6 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆113Updated 7 months ago
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆29Updated 3 weeks ago
- An implementation of PSGD Kron second-order optimizer for PyTorch☆16Updated this week
- An implementation of the Llama architecture, to instruct and delight☆21Updated 3 months ago
- ☆18Updated last month
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam☆68Updated 3 months ago
- ☆19Updated 7 months ago
- A repo based on XiLin Li's PSGD repo that extends some of the experiments.☆14Updated last month
- Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overhead☆121Updated this week
- ☆198Updated 4 months ago
- ☆40Updated 4 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆95Updated 6 months ago
- A library for unit scaling in PyTorch☆105Updated 2 weeks ago
- Flow-matching algorithms in JAX☆77Updated 3 months ago