neelnanda-io / Grokking
A Mechanistic Interpretability Analysis of Grokking
☆21Updated 2 years ago
Alternatives and similar repositories for Grokking:
Users that are interested in Grokking are comparing it to the libraries listed below
- ☆26Updated last year
- gzip Predicts Data-dependent Scaling Laws☆34Updated 10 months ago
- Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons☆12Updated 2 years ago
- ☆121Updated last year
- A MAD laboratory to improve AI architecture designs 🧪☆111Updated 4 months ago
- ☆26Updated last year
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paper☆120Updated 2 years ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆95Updated 2 months ago
- unofficial re-implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"☆78Updated 2 years ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆73Updated 4 months ago
- ☆62Updated 2 years ago
- A domain-specific probabilistic programming language for modeling and inference with language models☆126Updated last year
- This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explaine…☆36Updated 2 years ago
- Universal Neurons in GPT2 Language Models☆27Updated 10 months ago
- we got you bro☆35Updated 8 months ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆26Updated 11 months ago
- ☆51Updated 11 months ago
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆69Updated 10 months ago
- ☆72Updated 2 months ago
- ☆22Updated 2 months ago
- ☆19Updated last week
- ☆128Updated 3 weeks ago
- Learning Universal Predictors☆74Updated 8 months ago
- ☆28Updated 3 weeks ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆189Updated 10 months ago
- Redwood Research's transformer interpretability tools☆14Updated 3 years ago
- Open source interpretability artefacts for R1.☆82Updated this week
- Sparse and discrete interpretability tool for neural networks☆62Updated last year
- ☆90Updated 2 months ago
- ☆36Updated last month