KellerJordan / top-sgdLinks
Optimization algorithm which fits a ResNet to CIFAR-10 5x faster than SGD / Adam (with terrible generalization)
☆14Updated last year
Alternatives and similar repositories for top-sgd
Users that are interested in top-sgd are comparing it to the libraries listed below
Sorting:
- Replicating and dissecting the git-re-basin project in one-click-replication Colabs☆36Updated 2 years ago
- This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explaine…☆37Updated 2 years ago
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆175Updated this week
- ☆53Updated 8 months ago
- Transformers with doubly stochastic attention☆45Updated 2 years ago
- Euclidean Wasserstein-2 optimal transportation☆47Updated last year
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆66Updated 8 months ago
- Fast training of unitary deep network layers from low-rank updates☆28Updated 2 years ago
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆124Updated last year
- Latest Weight Averaging (NeurIPS HITY 2022)☆30Updated last year
- Source code of "What can linearized neural networks actually say about generalization?☆20Updated 3 years ago
- A system for automating selection and optimization of pre-trained models from the TAO Model Zoo☆25Updated 11 months ago
- ☆22Updated 2 years ago
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆81Updated last year
- Deep Networks Grok All the Time and Here is Why☆36Updated last year
- Collection of snippets for PyTorch users☆25Updated 3 years ago
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch☆25Updated 4 months ago
- Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks☆10Updated 11 months ago
- ☆32Updated 8 months ago
- Blog post☆17Updated last year
- ☆51Updated 11 months ago
- Supporing code for the paper "Bayesian Model Selection, the Marginal Likelihood, and Generalization".☆35Updated 2 years ago
- [TMLR 2022] Curvature access through the generalized Gauss-Newton's low-rank structure: Eigenvalues, eigenvectors, directional derivative…☆17Updated last year
- Code for PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization, NeurIPS 2022☆15Updated 2 years ago
- ☆67Updated 5 months ago
- Pytorch implementation of a simple way to enable (Stochastic) Frame Averaging for any network☆50Updated 10 months ago
- JAX implementation of "Fine-Tuning Language Models with Just Forward Passes"☆19Updated last year
- Implementation of Action Matching for the Schrödinger equation☆24Updated last year
- ☆32Updated last year
- Multi-framework implementation of Deep Kernel Shaping and Tailored Activation Transformations, which are methods that modify neural netwo…☆70Updated last week