neelnanda-io / Neuroscope
Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons
☆12Updated 2 years ago
Alternatives and similar repositories for Neuroscope:
Users that are interested in Neuroscope are comparing it to the libraries listed below
- A TinyStories LM with SAEs and transcoders☆11Updated last month
- ☆16Updated last year
- ☆15Updated 5 months ago
- A Mechanistic Interpretability Analysis of Grokking☆21Updated 2 years ago
- Improving Steering Vectors by Targeting Sparse Autoencoder Features☆15Updated 3 months ago
- ☆31Updated this week
- Certified Reasoning with Language Models☆31Updated last year
- ☆55Updated 3 months ago
- ☆50Updated 4 months ago
- Code for "Evidence of Learned Look-Ahead in a Chess-Playing Neural Network"☆20Updated 8 months ago
- A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations☆14Updated 10 months ago
- ☆12Updated this week
- ☆27Updated 3 months ago
- ☆19Updated last year
- ☆26Updated last year
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆78Updated this week
- Open source replication of Anthropic's Crosscoders for Model Diffing☆39Updated 3 months ago
- Experimental LLM interface exploring new ways to use AI to improve human thinking☆15Updated 2 weeks ago
- A dataset of alignment research and code to reproduce it☆73Updated last year
- ☆25Updated 10 months ago
- ☆20Updated 3 months ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆26Updated 8 months ago
- Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.☆30Updated 2 months ago
- Redwood Research's transformer interpretability tools☆14Updated 2 years ago
- ☆12Updated 2 years ago
- Measuring the situational awareness of language models☆34Updated last year
- Mechanistic Interpretability for Transformer Models☆49Updated 2 years ago
- Training hybrid models for dummies.☆20Updated last month
- Simple (fast) transformer inference in PyTorch with torch.compile + lit-llama code☆10Updated last year
- This repo is built to facilitate the training and analysis of autoregressive transformers on maze-solving tasks.☆26Updated 5 months ago