nikhilvyas / SOAPLinks

☆225

Alternatives and similar repositories for SOAP

Users that are interested in SOAP are comparing it to the libraries listed below

Sorting:

HomebrewML / HeavyBall
Efficient optimizers
☆276Updated 3 weeks ago
shikaiqiu / compute-better-spent
☆61Updated last year
lixilinx / psgd_torch
Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…
☆188Updated last month
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆84Updated 11 months ago
apple / ml-ademamix
☆68Updated last year
proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆192Updated last year
modula-systems / modula
🧱 Modula software package
☆307Updated 3 months ago
KellerJordan / cifar10-airbench
CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds
☆327Updated 2 weeks ago
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆174Updated 5 months ago
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆135Updated 11 months ago
nanowell / AdEMAMix-Optimizer-Pytorch
The AdEMAMix Optimizer: Better, Faster, Older.
☆186Updated last year
kvfrans / splus
☆119Updated 5 months ago
thinking-machines-lab / manifolds
Supporting code for the blog post on modular manifolds.
☆103Updated 2 months ago
evanatyourservice / kron_torch
An implementation of PSGD Kron second-order optimizer for PyTorch
☆97Updated 4 months ago
google-deepmind / nanodo
☆285Updated last year
cloneofsimo / scaling-guide
WIP
☆93Updated last year
LIONS-EPFL / scion
☆48Updated last month
ruke1ire / RTF
A State-Space Model with Rational Transfer Function Representation.
☆83Updated last year
lucidrains / nGPT-pytorch
Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI
☆294Updated 6 months ago
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆194Updated last year
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆132Updated 4 months ago
davisyoshida / lorax
LoRA for arbitrary JAX models and functions
☆143Updated last year
cloneofsimo / min-fsdp
☆91Updated last year
riverstone496 / awesome-second-order-optimization
☆28Updated 2 months ago
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆100Updated last year
epfml / llm-baselines
nanoGPT-like codebase for LLM training
☆111Updated 3 weeks ago
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year
PolymathicAI / xVal
Repository for code used in the xVal paper
☆145Updated last year
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆240Updated 2 months ago
formll / dog
DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule
☆63Updated 2 years ago