neelnanda-io / GrokkingLinks
A Mechanistic Interpretability Analysis of Grokking
☆27Updated 3 years ago
Alternatives and similar repositories for Grokking
Users that are interested in Grokking are comparing it to the libraries listed below
Sorting:
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paper☆135Updated 3 years ago
- Tools for studying developmental interpretability in neural networks.☆126Updated last month
- we got you bro☆37Updated last year
- ☆153Updated 5 months ago
- Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons☆13Updated 2 years ago
- Learning Universal Predictors☆81Updated last year
- Emergent world representations: Exploring a sequence model trained on a synthetic task☆201Updated 2 years ago
- ☆132Updated 2 years ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆198Updated last year
- Latent Program Network (from the "Searching Latent Program Spaces" paper)☆107Updated 2 months ago
- Attribution-based Parameter Decomposition☆33Updated 8 months ago
- ☆65Updated last week
- unofficial re-implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"☆83Updated 3 years ago
- Brain-Inspired Modular Training (BIMT), a method for making neural networks more modular and interpretable.☆175Updated 2 years ago
- 🧠 Starter templates for doing interpretability research☆76Updated 2 years ago
- ☆29Updated last year
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆240Updated last year
- Open source interpretability artefacts for R1.☆170Updated 9 months ago
- A domain-specific probabilistic programming language for modeling and inference with language models☆141Updated 9 months ago
- ☆17Updated this week
- epsilon machines and transformers!☆34Updated 7 months ago
- Universal Neurons in GPT2 Language Models☆30Updated last year