KellerJordan / top-sgd
Optimization algorithm which fits a ResNet to CIFAR-10 5x faster than SGD / Adam (with terrible generalization)
☆14Updated last year
Alternatives and similar repositories for top-sgd:
Users that are interested in top-sgd are comparing it to the libraries listed below
- Euclidean Wasserstein-2 optimal transportation☆47Updated last year
- Replicating and dissecting the git-re-basin project in one-click-replication Colabs☆36Updated 2 years ago
- This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explaine…☆36Updated 2 years ago
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆174Updated this week
- ☆53Updated 7 months ago
- ☆30Updated 5 months ago
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆123Updated last year
- Transformers with doubly stochastic attention☆45Updated 2 years ago
- DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule☆60Updated last year
- Multi-framework implementation of Deep Kernel Shaping and Tailored Activation Transformations, which are methods that modify neural netwo…☆70Updated last week
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆66Updated 7 months ago
- Parameter-Free Optimizers for Pytorch☆126Updated last year
- JAX implementation of "Fine-Tuning Language Models with Just Forward Passes"☆19Updated last year
- Implementation of PSGD optimizer in JAX☆33Updated 4 months ago
- ☆53Updated 9 months ago
- ☆32Updated 7 months ago
- ☆49Updated last year
- Code for PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization, NeurIPS 2022☆15Updated 2 years ago
- Fast training of unitary deep network layers from low-rank updates☆28Updated 2 years ago
- flexible meta-learning in jax☆13Updated last year
- Deep Networks Grok All the Time and Here is Why☆34Updated 11 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆33Updated last year
- Lightning-like training API for JAX with Flax☆38Updated 5 months ago
- A collection of meta-learning algorithms in Jax☆23Updated 2 years ago
- ☆26Updated last year
- ☆18Updated last year
- ☆51Updated 11 months ago
- ☆16Updated 2 years ago
- Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks☆10Updated 10 months ago
- Supporing code for the paper "Bayesian Model Selection, the Marginal Likelihood, and Generalization".☆35Updated 2 years ago