KellerJordan / top-sgdLinks

Optimization algorithm which fits a ResNet to CIFAR-10 5x faster than SGD / Adam (with terrible generalization)

☆14

Alternatives and similar repositories for top-sgd

Users that are interested in top-sgd are comparing it to the libraries listed below

Sorting:

lixilinx / psgd_torch
Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…
☆188Updated last month
stanislavfort / dissect-git-re-basin
Replicating and dissecting the git-re-basin project in one-click-replication Colabs
☆35Updated 3 years ago
google-deepmind / eigengame
Open source code for EigenGame.
☆33Updated 2 years ago
shikaiqiu / compute-better-spent
☆61Updated last year
bremen79 / parameterfree
Parameter-Free Optimizers for Pytorch
☆130Updated last year
thjashin / multires-conv
Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)
☆127Updated 2 years ago
vvvm23 / mamba-jax
Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX
☆89Updated last year
facebookresearch / w2ot
Euclidean Wasserstein-2 optimal transportation
☆47Updated 2 years ago
epfml / llm-baselines
nanoGPT-like codebase for LLM training
☆110Updated 2 weeks ago
nikhilvyas / SOAP
☆223Updated 11 months ago
phlippe / jax_trainer
Lightning-like training API for JAX with Flax
☆44Updated 11 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
johnryan465 / pscan
☆39Updated last year
andylolu2 / jax-diffusion
Implementation of Denoising Diffusion Probabilistic Models (DDPM) in JAX and Flax.
☆20Updated 2 years ago
lucidrains / GAF-microbatch-pytorch
Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch
☆25Updated 10 months ago
AhmedImtiazPrio / grok-adversarial
Deep Networks Grok All the Time and Here is Why
☆37Updated last year
formll / dog
DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule
☆63Updated 2 years ago
mgrankin / minGPT
minGPT in JAX
☆48Updated 3 years ago
davisyoshida / lorax
LoRA for arbitrary JAX models and functions
☆143Updated last year
young-geng / scalax
A simple library for scaling up JAX programs
☆144Updated 2 weeks ago
cloneofsimo / min-fsdp
☆91Updated last year
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆90Updated last year
aks2203 / easy-to-hard
Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"
☆59Updated 3 years ago
AllanYangZhou / universal_neural_functional
☆53Updated last year
JesseFarebro / flax-mup
Maximal Update Parametrization (μP) with Flax & Optax.
☆16Updated last year
tml-epfl / why-weight-decay
Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]
☆68Updated last year
google-deepmind / dks
Multi-framework implementation of Deep Kernel Shaping and Tailored Activation Transformations, which are methods that modify neural netwo…
☆74Updated 4 months ago
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆84Updated 11 months ago
locuslab / edge-of-stability
☆72Updated 11 months ago
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆100Updated last year