anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆97Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for toy-models-of-superposition
- ☆108Updated last year
- ☆188Updated last month
- ☆54Updated 2 years ago
- ☆44Updated this week
- Mechanistic Interpretability for Transformer Models☆49Updated 2 years ago
- ☆99Updated 3 months ago
- ☆106Updated last month
- Mechanistic Interpretability Visualizations using React☆200Updated 4 months ago
- Tools for studying developmental interpretability in neural networks.☆77Updated last week
- ☆109Updated this week
- Emergent world representations: Exploring a sequence model trained on a synthetic task☆170Updated last year
- Sparse and discrete interpretability tool for neural networks☆55Updated 9 months ago
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆186Updated this week
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆158Updated last month
- Sparse Autoencoder Training Library☆28Updated 3 weeks ago
- 🧠 Starter templates for doing interpretability research☆63Updated last year
- Using sparse coding to find distributed representations used by neural networks.☆188Updated last year
- ☆26Updated last year
- ☆76Updated 9 months ago
- ☆71Updated 3 months ago
- Sparse Autoencoder for Mechanistic Interpretability☆189Updated 4 months ago
- A library for efficient patching and automatic circuit discovery.☆31Updated last month
- ☆146Updated last month
- Universal Neurons in GPT2 Language Models☆27Updated 5 months ago
- ☆101Updated 3 months ago
- Steering Llama 2 with Contrastive Activation Addition☆98Updated 6 months ago
- Neural Networks and the Chomsky Hierarchy☆187Updated 7 months ago
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆14Updated 7 months ago
- A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations☆174Updated 2 years ago
- ☆24Updated 7 months ago