optimizedlearning / mechanic
☆35Updated 7 months ago
Related projects: ⓘ
- Repository for the "Gotta Go Fast When Generating Data with Score-Based Models" paper☆101Updated 2 years ago
- Pytorch implementation of preconditioned stochastic gradient descent (affine group preconditioner, low-rank approximation preconditioner …☆105Updated this week
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆119Updated 11 months ago
- ☆33Updated last year
- Transformers with doubly stochastic attention☆40Updated 2 years ago
- scipy linear operators for the Hessian, Fisher/GGN, and more in PyTorch☆17Updated this week
- Official code for "Maximum Likelihood Training of Score-Based Diffusion Models", NeurIPS 2021 (spotlight)☆130Updated 2 years ago
- Sequence Modeling with Structured State Spaces☆60Updated 2 years ago
- Replicating and dissecting the git-re-basin project in one-click-replication Colabs☆36Updated 2 years ago
- Easy Hypernetworks in Pytorch and Jax☆94Updated last year
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆94Updated last year
- Easy-to-use AdaHessian optimizer (PyTorch)☆77Updated 3 years ago
- Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper☆78Updated 2 years ago
- Implementations and checkpoints for ResNet, Wide ResNet, ResNeXt, ResNet-D, and ResNeSt in JAX (Flax).☆103Updated 2 years ago
- ☆13Updated last year
- ☆47Updated last year
- [ICML 2024] SINGD: KFAC-like Structured Inverse-Free Natural Gradient Descent (http://arxiv.org/abs/2312.05705)☆19Updated 2 months ago
- Code for ICLR 2021 Paper, "Anytime Sampling for Autoregressive Models via Ordered Autoencoding"☆23Updated last year
- Layerwise Batch Entropy Regularization☆22Updated 2 years ago
- ☆49Updated 3 years ago
- Why Do We Need Weight Decay in Modern Deep Learning? [arXiv, Oct 2023]☆41Updated 11 months ago
- ☆42Updated 3 months ago
- Code for the article "What if Neural Networks had SVDs?", to be presented as a spotlight paper at NeurIPS 2020.☆68Updated last month
- ☆62Updated 7 months ago
- A minimalist implementation of score-based diffusion model☆120Updated 3 years ago
- ☆96Updated 2 years ago
- DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule☆57Updated last year
- ☆14Updated 3 months ago
- Official code for Long Expressive Memory (ICLR 2022, Spotlight)☆69Updated 2 years ago
- Drop-in replacement for any ResNet with a significantly reduced memory footprint and better representation capabilities☆207Updated 4 months ago