neelnanda-io / Grokking
A Mechanistic Interpretability Analysis of Grokking
β21Updated 2 years ago
Alternatives and similar repositories for Grokking:
Users that are interested in Grokking are comparing it to the libraries listed below
- Universal Neurons in GPT2 Language Modelsβ27Updated 9 months ago
- π§ Starter templates for doing interpretability researchβ67Updated last year
- Implementation of OpenAI's 'Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets' paper.β35Updated last year
- unofficial re-implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"β76Updated 2 years ago
- β61Updated 4 months ago
- A MAD laboratory to improve AI architecture designs π§ͺβ108Updated 3 months ago
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paperβ117Updated 2 years ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from eβ¦β26Updated 10 months ago
- β121Updated last year
- β26Updated last year
- Code for reproducing our paper "Not All Language Model Features Are Linear"β72Updated 3 months ago
- we got you broβ35Updated 7 months ago
- Steering vectors for transformer language models in Pytorch / Huggingfaceβ90Updated last month
- Evaluation of neuro-symbolic enginesβ35Updated 7 months ago
- gzip Predicts Data-dependent Scaling Lawsβ34Updated 9 months ago