KellerJordan / top-sgdLinks
Optimization algorithm which fits a ResNet to CIFAR-10 5x faster than SGD / Adam (with terrible generalization)
☆14Updated 2 years ago
Alternatives and similar repositories for top-sgd
Users that are interested in top-sgd are comparing it to the libraries listed below
Sorting:
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆188Updated last week
- Replicating and dissecting the git-re-basin project in one-click-replication Colabs☆36Updated 3 years ago
- Parameter-Free Optimizers for Pytorch☆130Updated last year
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆92Updated last year
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆127Updated 2 years ago
- ☆62Updated last year
- DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule☆63Updated 2 years ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆69Updated last year
- Open source code for EigenGame.☆33Updated 2 years ago
- Multi-framework implementation of Deep Kernel Shaping and Tailored Activation Transformations, which are methods that modify neural netwo…☆74Updated 5 months ago
- Implementation of GateLoop Transformer in Pytorch and Jax☆91Updated last year
- ☆227Updated last year
- Lightning-like training API for JAX with Flax☆44Updated last year
- Euclidean Wasserstein-2 optimal transportation☆47Updated 2 years ago
- ☆73Updated last year
- Deep Networks Grok All the Time and Here is Why☆38Updated last year
- minGPT in JAX☆48Updated 3 years ago
- This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explaine…☆40Updated 2 years ago
- Code for papers Linear Algebra with Transformers (TMLR) and What is my Math Transformer Doing? (AI for Maths Workshop, Neurips 2022)☆76Updated last year
- ☆91Updated last year
- This repository contains a Jax implementation of conformal training corresponding to the ICLR'22 paper "learning optimal conformal classi…☆130Updated 3 years ago
- Meta Optimal Transport☆105Updated 2 years ago
- Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)☆97Updated last year
- A simple library for scaling up JAX programs☆144Updated last month
- nanoGPT-like codebase for LLM training☆113Updated last month
- ☆35Updated last year
- Transformers with doubly stochastic attention☆50Updated 3 years ago
- ASDL: Automatic Second-order Differentiation Library for PyTorch☆191Updated last year
- Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"☆60Updated 3 years ago
- [ICML 2024] SINGD: KFAC-like Structured Inverse-Free Natural Gradient Descent (http://arxiv.org/abs/2312.05705)☆23Updated last year