nikhilvyas / SOAPLinks
☆206Updated 7 months ago
Alternatives and similar repositories for SOAP
Users that are interested in SOAP are comparing it to the libraries listed below
Sorting:
- Efficient optimizers☆252Updated last week
- supporting pytorch FSDP for optimizers☆84Updated 7 months ago
- 🧱 Modula software package☆210Updated this week
- ☆64Updated 8 months ago
- Accelerated First Order Parallel Associative Scan☆184Updated 11 months ago
- ☆53Updated 9 months ago
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆274Updated 2 weeks ago
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆180Updated last month
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆149Updated last month
- A library for unit scaling in PyTorch☆128Updated 3 weeks ago
- An implementation of PSGD Kron second-order optimizer for PyTorch☆94Updated last week
- The AdEMAMix Optimizer: Better, Faster, Older.☆184Updated 10 months ago
- ☆115Updated last month
- A State-Space Model with Rational Transfer Function Representation.☆79Updated last year
- A MAD laboratory to improve AI architecture designs 🧪☆123Updated 7 months ago
- ☆274Updated last year
- ☆31Updated 3 weeks ago
- WIP☆93Updated 11 months ago
- nanoGPT-like codebase for LLM training☆102Updated 2 months ago
- Library for Jacobian descent with PyTorch. It enables the optimization of neural networks with multiple losses (e.g. multi-task learning)…☆260Updated this week
- ☆82Updated last year
- Understand and test language model architectures on synthetic tasks.☆221Updated 2 weeks ago
- Implementation of PSGD optimizer in JAX☆34Updated 7 months ago
- LoRA for arbitrary JAX models and functions☆140Updated last year
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆130Updated last year
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆288Updated last month
- A simple library for scaling up JAX programs☆140Updated 9 months ago
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam☆84Updated last year
- Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…☆124Updated last year
- DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule☆63Updated last year