HomebrewML / HeavyBall
Efficient optimizers
β169Updated this week
Alternatives and similar repositories for HeavyBall:
Users that are interested in HeavyBall are comparing it to the libraries listed below
- β158Updated 2 months ago
- supporting pytorch FSDP for optimizersβ76Updated 2 months ago
- 𧱠Modula software packageβ139Updated this week
- Accelerated First Order Parallel Associative Scanβ171Updated 5 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorchβ82Updated last week
- Fast, Modern, Memory Efficient, and Low Precision PyTorch Optimizersβ82Updated 7 months ago
- Muon optimizer: +~30% sample efficiency with <3% wallclock overheadβ252Updated last week
- LoRA for arbitrary JAX models and functionsβ135Updated 11 months ago
- WIPβ93Updated 6 months ago
- The AdEMAMix Optimizer: Better, Faster, Older.β177Updated 5 months ago
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 secondsβ205Updated this week
- Focused on fast experimentation and simplicityβ65Updated last month
- Understand and test language model architectures on synthetic tasks.β181Updated last month
- β53Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β95Updated 3 months ago
- β208Updated 7 months ago
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adamβ73Updated 6 months ago
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation preconditionβ¦β168Updated 2 months ago
- DeMo: Decoupled Momentum Optimizationβ180Updated 2 months ago
- β54Updated 3 months ago
- Normalized Transformer (nGPT)β152Updated 3 months ago
- β75Updated 7 months ago
- JAX implementation of the Llama 2 modelβ215Updated last year
- seqax = sequence modeling + JAXβ143Updated 7 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingβ122Updated 10 months ago
- A library for unit scaling in PyTorchβ122Updated 2 months ago
- A simple library for scaling up JAX programsβ129Updated 3 months ago
- For optimization algorithm research and development.β490Updated last week
- Implementation of GateLoop Transformer in Pytorch and Jaxβ87Updated 8 months ago
- β51Updated 4 months ago