Sea-Snell / grokking
unofficial re-implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"
☆78Updated 2 years ago
Alternatives and similar repositories for grokking
Users that are interested in grokking are comparing it to the libraries listed below
Sorting:
- Implementation of OpenAI's 'Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets' paper.☆36Updated last year
- ☆25Updated 2 years ago
- PyTorch implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"☆36Updated 3 years ago
- Omnigrok: Grokking Beyond Algorithmic Data☆56Updated 2 years ago
- ☆67Updated 5 months ago
- Code accompanying our paper "Feature Learning in Infinite-Width Neural Networks" (https://arxiv.org/abs/2011.14522)☆62Updated 4 years ago
- ☆80Updated last year
- Code Release for "Broken Neural Scaling Laws" (BNSL) paper☆58Updated last year
- Scaling scaling laws with board games.☆48Updated last year
- nanoGPT-like codebase for LLM training☆94Updated last month
- ☆28Updated last month
- ☆177Updated last year
- Mechanistic Interpretability for Transformer Models☆50Updated 2 years ago
- Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implici…☆106Updated last year
- A centralized place for deep thinking code and experiments☆84Updated last year
- LoRA for arbitrary JAX models and functions☆136Updated last year
- Sparse Autoencoder Training Library☆49Updated 2 weeks ago
- A library to create and manage configuration files, especially for machine learning projects.☆78Updated 3 years ago
- Meta-learning inductive biases in the form of useful conserved quantities.☆37Updated 2 years ago
- Code for paper "Compositional Sculpting of Iterative Generative Processes"☆21Updated last year
- Emergent world representations: Exploring a sequence model trained on a synthetic task☆181Updated last year
- ☆114Updated 9 months ago
- Code associated to papers on superposition (in ML interpretability)☆28Updated 2 years ago
- ☆27Updated last year
- ☆26Updated 2 years ago
- Neural Networks and the Chomsky Hierarchy☆206Updated last year
- minGPT in JAX☆48Updated 3 years ago
- Code for our paper "Generative Flow Networks for Discrete Probabilistic Modeling"☆82Updated 2 years ago
- Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"☆59Updated 3 years ago
- ☆12Updated 4 months ago