KellerJordan / top-sgd
Optimization algorithm which fits a ResNet to CIFAR-10 5x faster than SGD / Adam (with terrible generalization)
☆12Updated last year
Alternatives and similar repositories for top-sgd:
Users that are interested in top-sgd are comparing it to the libraries listed below
- Replicating and dissecting the git-re-basin project in one-click-replication Colabs☆36Updated 2 years ago
- ☆50Updated 3 months ago
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆148Updated last month
- Euclidean Wasserstein-2 optimal transportation☆44Updated last year
- ☆52Updated 2 months ago
- Laplace approximated BNN surrogate for BoTorch☆8Updated 2 months ago
- Meta Optimal Transport☆98Updated last year
- An implementation of PSGD Kron second-order optimizer for PyTorch☆22Updated 2 weeks ago
- Automatically take good care of your preemptible TPUs☆34Updated last year
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆82Updated 11 months ago
- Lightning-like training API for JAX with Flax☆36Updated last month
- ☆33Updated last year
- Transformers with doubly stochastic attention☆44Updated 2 years ago
- Implementations and checkpoints for ResNet, Wide ResNet, ResNeXt, ResNet-D, and ResNeSt in JAX (Flax).☆106Updated 2 years ago
- Official repository for our ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology☆35Updated 3 years ago
- Implementation of GateLoop Transformer in Pytorch and Jax☆87Updated 7 months ago
- Open source code for EigenGame.☆29Updated last year
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆44Updated last year
- Automatic Integration for Neural Spatio-Temporal Point Process models (AI-STPP) is a new paradigm for exact, efficient, non-parametric inf…☆24Updated 3 months ago
- JAX implementation of "Fine-Tuning Language Models with Just Forward Passes"☆19Updated last year
- Code repo for ICLR 24 BlogPost titled "Building Diffusion Model's theory from ground up"☆15Updated last year
- supporting pytorch FSDP for optimizers☆75Updated last month
- Fast training of unitary deep network layers from low-rank updates☆28Updated 2 years ago
- ☆37Updated last year
- ☆48Updated 11 months ago
- A system for automating selection and optimization of pre-trained models from the TAO Model Zoo☆24Updated 6 months ago
- A general framework for inference-time scaling and steering of diffusion models with arbitrary rewards.☆40Updated this week
- Supporing code for the paper "Bayesian Model Selection, the Marginal Likelihood, and Generalization".☆35Updated 2 years ago
- TorchDR - PyTorch Dimensionality Reduction☆73Updated this week
- ☆53Updated last year