KellerJordan / top-sgd
Optimization algorithm which fits a ResNet to CIFAR-10 5x faster than SGD / Adam (with terrible generalization)
☆12Updated last year
Related projects ⓘ
Alternatives and complementary repositories for top-sgd
- Pytorch implementation of preconditioned stochastic gradient descent (affine group preconditioner, low-rank approximation preconditioner …☆128Updated last month
- ☆46Updated last month
- Multi-framework implementation of Deep Kernel Shaping and Tailored Activation Transformations, which are methods that modify neural netwo…☆64Updated this week
- A system for automating selection and optimization of pre-trained models from the TAO Model Zoo☆22Updated 4 months ago
- Replicating and dissecting the git-re-basin project in one-click-replication Colabs☆36Updated 2 years ago
- Code repo for ICLR 24 BlogPost titled "Building Diffusion Model's theory from ground up"☆13Updated 11 months ago
- Pytorch implementation of a simple way to enable (Stochastic) Frame Averaging for any network☆47Updated 3 months ago
- ☆128Updated this week
- Euclidean Wasserstein-2 optimal transportation☆44Updated last year
- ☆48Updated 9 months ago
- Meta Optimal Transport☆97Updated last year
- Simple Scalable Discrete Diffusion for text in PyTorch☆28Updated last month
- A scalable implementation of diffusion and flow-matching with XGBoost models, applied to calorimeter data.☆17Updated 2 weeks ago
- Flow-matching algorithms in JAX☆77Updated 3 months ago
- ☆29Updated 2 months ago
- Meta-learning inductive biases in the form of useful conserved quantities.☆37Updated 2 years ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆35Updated 4 months ago
- ☆40Updated 4 months ago
- Open source code for EigenGame.☆28Updated last year
- Experiment of using Tangent to autodiff triton☆72Updated 9 months ago
- ☆48Updated last week
- Scalable neural net training via automatic normalization in the modular norm.☆121Updated 3 months ago
- simple bibtex generator for any text with \cite{}☆31Updated 4 months ago
- Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"☆60Updated 2 years ago
- A basic pure pytorch implementation of flash attention☆16Updated 3 weeks ago
- ☆73Updated 4 months ago
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆79Updated 9 months ago
- Transformers with doubly stochastic attention☆40Updated 2 years ago
- FID computation in Jax/Flax.☆24Updated 4 months ago