neelnanda-io / GrokkingLinks
A Mechanistic Interpretability Analysis of Grokking
ā22Updated 2 years ago
Alternatives and similar repositories for Grokking
Users that are interested in Grokking are comparing it to the libraries listed below
Sorting:
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paperā129Updated 2 years ago
- š§ Starter templates for doing interpretability researchā73Updated 2 years ago
- we got you broā36Updated last year
- unofficial re-implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"ā79Updated 3 years ago
- ā126Updated last year
- ā141Updated 2 weeks ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from eā¦ā28Updated last year
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)ā192Updated last year
- Steering vectors for transformer language models in Pytorch / Huggingfaceā121Updated 6 months ago
- ā57Updated last month
- Code for reproducing our paper "Not All Language Model Features Are Linear"ā77Updated 9 months ago
- ā98Updated 4 months ago
- Emergent world representations: Exploring a sequence model trained on a synthetic taskā188Updated 2 years ago
- ā21Updated 4 months ago
- A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizationsā14Updated last year
- Tools for studying developmental interpretability in neural networks.ā101Updated 2 months ago
- Implementation of OpenAI's 'Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets' paper.ā39Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.ā156Updated 2 months ago
- ā30Updated 5 months ago
- Sparse Autoencoder Training Libraryā54Updated 4 months ago
- Universal Neurons in GPT2 Language Modelsā30Updated last year
- Open source interpretability artefacts for R1.ā157Updated 4 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).ā214Updated 8 months ago
- Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neuronsā13Updated 2 years ago
- Sparse and discrete interpretability tool for neural networksā63Updated last year
- ā106Updated 6 months ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models ā¦ā207Updated this week
- Mechanistic Interpretability Visualizations using Reactā281Updated 8 months ago
- ā23Updated 7 months ago
- Code associated to papers on superposition (in ML interpretability)ā31Updated 2 years ago