KellerJordan / top-sgd
Optimization algorithm which fits a ResNet to CIFAR-10 5x faster than SGD / Adam (with terrible generalization)
☆12Updated last year
Alternatives and similar repositories for top-sgd:
Users that are interested in top-sgd are comparing it to the libraries listed below
- Replicating and dissecting the git-re-basin project in one-click-replication Colabs☆36Updated 2 years ago
- Latest Weight Averaging (NeurIPS HITY 2022)☆28Updated last year
- ☆52Updated 4 months ago
- ☆40Updated 2 months ago
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆168Updated 2 months ago
- FID computation in Jax/Flax.☆27Updated 7 months ago
- Open source code for EigenGame.☆30Updated last year
- A system for automating selection and optimization of pre-trained models from the TAO Model Zoo☆24Updated 7 months ago
- ☆21Updated 2 years ago
- ☆33Updated 5 months ago
- Euclidean Wasserstein-2 optimal transportation☆44Updated last year
- JAX Implementation of Black Forest Labs' Flux.1 family of models☆29Updated 4 months ago
- Fast training of unitary deep network layers from low-rank updates☆28Updated 2 years ago
- ☆49Updated last year
- ☆30Updated 3 months ago
- Code for PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization, NeurIPS 2022☆15Updated 2 years ago
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch☆23Updated last month
- JAX implementation of "Fine-Tuning Language Models with Just Forward Passes"☆19Updated last year
- Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"☆27Updated 4 years ago
- ☆159Updated 2 months ago
- DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule☆58Updated last year
- Implementation of GateLoop Transformer in Pytorch and Jax☆87Updated 8 months ago
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆122Updated last year
- TabDPT: Scaling Tabular Foundation Models☆25Updated 2 weeks ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆44Updated last year
- Parameter-Free Optimizers for Pytorch☆113Updated 9 months ago
- ☆55Updated 3 months ago
- ☆11Updated 9 months ago
- ☆59Updated 2 years ago
- Meta-learning inductive biases in the form of useful conserved quantities.☆37Updated 2 years ago