formll / dog
DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule
☆58Updated last year
Alternatives and similar repositories for dog:
Users that are interested in dog are comparing it to the libraries listed below
- Replicating and dissecting the git-re-basin project in one-click-replication Colabs☆36Updated 2 years ago
- ☆52Updated 4 months ago
- Parameter-Free Optimizers for Pytorch☆113Updated 9 months ago
- Euclidean Wasserstein-2 optimal transportation☆44Updated last year
- Transformers with doubly stochastic attention☆45Updated 2 years ago
- Lightning-like training API for JAX with Flax☆38Updated 2 months ago
- Code for https://arxiv.org/abs/2406.04329☆58Updated 2 months ago
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆122Updated last year
- ☆42Updated this week
- This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explaine…☆35Updated last year
- PyTorch linear operators for curvature matrices (Hessian, Fisher/GGN, KFAC, ...)☆31Updated this week
- Implementation of OpenAI's 'Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets' paper.☆35Updated last year
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆168Updated 2 months ago
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20Updated last year
- ☆63Updated 2 months ago
- ☆159Updated 2 months ago
- unofficial re-implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"☆72Updated 2 years ago
- ☆36Updated last year
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆82Updated last year
- nanoGPT-like codebase for LLM training☆89Updated this week
- 🧱 Modula software package☆146Updated this week
- ☆24Updated 2 years ago
- [ICML 2024] SINGD: KFAC-like Structured Inverse-Free Natural Gradient Descent (http://arxiv.org/abs/2312.05705)☆21Updated 3 months ago
- ☆60Updated 3 years ago
- A centralized place for deep thinking code and experiments☆81Updated last year
- Pytorch-like dataloaders for JAX.☆75Updated 4 months ago
- Code for the paper: Why Transformers Need Adam: A Hessian Perspective☆49Updated 9 months ago
- ☆33Updated 5 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆70Updated 3 months ago
- Code for paper "Compositional Sculpting of Iterative Generative Processes"☆20Updated last year