formll / dogLinks
DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule
☆64Updated 2 years ago
Alternatives and similar repositories for dog
Users that are interested in dog are comparing it to the libraries listed below
Sorting:
- Omnigrok: Grokking Beyond Algorithmic Data☆62Updated 2 years ago
- Parameter-Free Optimizers for Pytorch☆130Updated last year
- ☆62Updated last year
- ☆246Updated last year
- Replicating and dissecting the git-re-basin project in one-click-replication Colabs☆37Updated 3 years ago
- This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explaine…☆40Updated 2 years ago
- ☆33Updated last year
- ☆73Updated last year
- ☆53Updated last month
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆190Updated 3 weeks ago
- unofficial re-implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"☆83Updated 3 years ago
- Code accompanying our paper "Feature Learning in Infinite-Width Neural Networks" (https://arxiv.org/abs/2011.14522)☆62Updated 4 years ago
- Transformers with doubly stochastic attention☆53Updated 3 years ago
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆92Updated 2 years ago
- LoRA for arbitrary JAX models and functions☆144Updated last year
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆70Updated last year
- nanoGPT-like codebase for LLM training☆113Updated 3 months ago
- Official PyTorch implementation of NeuralSVD (ICML 2024)☆22Updated last year
- Sketched linear operations for PyTorch☆100Updated 3 months ago
- ☆61Updated 9 months ago
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆127Updated 2 years ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆53Updated 2 years ago
- ☆27Updated 3 years ago
- The Energy Transformer block, in JAX☆63Updated 2 years ago
- ☆35Updated last year
- A Python package for generating concise, high-quality summaries of a probability distribution☆57Updated 2 weeks ago
- Code for papers Linear Algebra with Transformers (TMLR) and What is my Math Transformer Doing? (AI for Maths Workshop, Neurips 2022)☆76Updated last year
- Pytorch code for experiments on Linear Transformers☆24Updated 2 years ago
- Lightning-like training API for JAX with Flax☆45Updated last year
- IVON optimizer for neural networks based on variational learning.☆81Updated last year