teddykoker / grokking
PyTorch implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"
☆32Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for grokking
- unofficial re-implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"☆63Updated 2 years ago
- Computing the eigenvalues of Neural Tangent Kernel and Conjugate Kernel (aka NNGP kernel) over the boolean cube☆47Updated 5 years ago
- Hessian spectral density estimation in TF and Jax☆115Updated 4 years ago
- Code for the paper: "Tensor Programs II: Neural Tangent Kernel for Any Architecture"☆97Updated 4 years ago
- Code accompanying our paper "Feature Learning in Infinite-Width Neural Networks" (https://arxiv.org/abs/2011.14522)☆58Updated 3 years ago
- Pytorch implementation of preconditioned stochastic gradient descent (affine group preconditioner, low-rank approximation preconditioner …☆127Updated last month
- ☆67Updated 5 years ago
- This repository contains the Julia code for the paper "Competitive Gradient Descent"☆23Updated 4 years ago
- Public Codebase for Rethinking Parameter Counting: Effective Dimensionality Revisited☆36Updated last year
- [NeurIPS'19] Deep Equilibrium Models Jax Implementation☆37Updated 4 years ago
- paper lists and information on mean-field theory of deep learning☆75Updated 5 years ago
- Structured matrices for compressing neural networks☆67Updated last year
- ☆78Updated 3 years ago
- codebase for "A Theory of the Inductive Bias and Generalization of Kernel Regression and Wide Neural Networks"☆49Updated last year
- Source code for ICLR 2020 paper: "Learning to Guide Random Search"☆39Updated 2 months ago
- ☆97Updated 2 years ago
- Image augmentation library for Jax☆37Updated 7 months ago
- Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"☆60Updated 2 years ago
- ☆36Updated 2 years ago
- Monotone operator equilibrium networks☆51Updated 4 years ago
- Meta-learning inductive biases in the form of useful conserved quantities.☆37Updated 2 years ago
- Omnigrok: Grokking Beyond Algorithmic Data☆49Updated last year
- ☆22Updated last year
- simple JAX-/NumPy-based implementations of NGD with exact/approximate Fisher Information Matrix both in parameter-space and function-spac…☆14Updated 4 years ago
- Parameter-Free Optimizers for Pytorch☆109Updated 6 months ago
- Implementations and checkpoints for ResNet, Wide ResNet, ResNeXt, ResNet-D, and ResNeSt in JAX (Flax).☆104Updated 2 years ago
- Code for: "Neural Rough Differential Equations for Long Time Series", (ICML 2021)☆115Updated 3 years ago
- ☆49Updated 4 years ago
- DeepOBS: A Deep Learning Optimizer Benchmark Suite☆103Updated 11 months ago
- Limitations of the Empirical Fisher Approximation☆45Updated 4 years ago