KellerJordan / top-sgd
Optimization algorithm which fits a ResNet to CIFAR-10 5x faster than SGD / Adam (with terrible generalization)
☆12Updated last year
Alternatives and similar repositories for top-sgd:
Users that are interested in top-sgd are comparing it to the libraries listed below
- ☆52Updated 5 months ago
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆171Updated this week
- Replicating and dissecting the git-re-basin project in one-click-replication Colabs☆36Updated 2 years ago
- ☆42Updated this week
- Multi-framework implementation of Deep Kernel Shaping and Tailored Activation Transformations, which are methods that modify neural netwo…☆68Updated last week
- Implementation of PSGD optimizer in JAX☆30Updated 2 months ago
- This is a port of Mistral-7B model in JAX☆32Updated 8 months ago
- Deep Networks Grok All the Time and Here is Why☆33Updated 10 months ago
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch☆23Updated 2 months ago
- JAX Implementation of Black Forest Labs' Flux.1 family of models☆30Updated 5 months ago
- ☆33Updated 6 months ago
- Transformers with doubly stochastic attention☆45Updated 2 years ago
- Meta-learning inductive biases in the form of useful conserved quantities.☆37Updated 2 years ago
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆122Updated last year
- ☆31Updated 11 months ago
- FID computation in Jax/Flax.☆27Updated 8 months ago
- ☆59Updated 4 months ago
- Latest Weight Averaging (NeurIPS HITY 2022)☆29Updated last year
- DiCE: The Infinitely Differentiable Monte-Carlo Estimator☆31Updated last year
- Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"☆28Updated 4 years ago
- ☆49Updated last year
- JAX implementation of "Fine-Tuning Language Models with Just Forward Passes"☆19Updated last year
- Pytorch implementation of a simple way to enable (Stochastic) Frame Averaging for any network☆49Updated 8 months ago
- ☆53Updated last year
- ☆25Updated last year
- ☆20Updated 11 months ago
- Lightning-like training API for JAX with Flax☆38Updated 3 months ago
- Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"☆60Updated 3 years ago
- Open source code for EigenGame.☆30Updated last year
- Flow-matching algorithms in JAX☆86Updated 7 months ago