timaeus-research/devinterp

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/timaeus-research/devinterp)

timaeus-research / devinterp

Tools for studying developmental interpretability in neural networks.

☆125

Alternatives and similar repositories for devinterp

Users that are interested in devinterp are comparing it to the libraries listed below

Sorting:

JasonGross / guarantees-based-mechanistic-interpretability
View on GitHub
☆17Updated this week
callummcdougall / sae_vis
View on GitHub
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆245Dec 16, 2024Updated last year
Butanium / tiny-activation-dashboard
View on GitHub
A tiny easily hackable implementation of a feature dashboard.
☆15Oct 21, 2025Updated 4 months ago
noanabeshima / tinymodel
View on GitHub
A TinyStories LM with SAEs and transcoders
☆14Apr 3, 2025Updated 10 months ago
koayon / atp_star
View on GitHub
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
☆20Jan 19, 2025Updated last year
adamkarvonen / SAEBench
View on GitHub
☆150Dec 30, 2025Updated 2 months ago
understanding-search / structured-representations-maze-transformers
View on GitHub
see github.com/understanding-search/maze-transformer
☆10Dec 8, 2023Updated 2 years ago
ApolloResearch / apd
View on GitHub
Attribution-based Parameter Decomposition
☆34Jun 11, 2025Updated 8 months ago
goodfire-ai / spd
View on GitHub
Stochastic Parameter Decomposition
☆66Updated this week
LRudL / evalugator
View on GitHub
(Model-written) LLM evals library
☆18Dec 13, 2024Updated last year
marcus-jw / Targeted-Manipulation-and-Deception-in-LLMs
View on GitHub
Codebase for "On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback". This repo implements a generative multi-tur…
☆23Dec 3, 2024Updated last year
safety-research / safety-examples
View on GitHub
☆25Nov 11, 2025Updated 3 months ago
AsaCooperStickland / situational-awareness-evals
View on GitHub
Measuring the situational awareness of language models
☆40Feb 12, 2024Updated 2 years ago
neelnanda-io / Neuroscope
View on GitHub
Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons
☆13Feb 13, 2023Updated 3 years ago
TransformerLensOrg / CircuitsVis
View on GitHub
Mechanistic Interpretability Visualizations using React
☆326Dec 18, 2024Updated last year
ai-safety-foundation / sparse_autoencoder
View on GitHub
Sparse Autoencoder for Mechanistic Interpretability
☆292Jul 20, 2024Updated last year
UFO-101 / auto-circuit
View on GitHub
A library for efficient patching and automatic circuit discovery.
☆90Dec 31, 2025Updated 2 months ago
EleutherAI / delphi
View on GitHub
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆243Updated this week
AlignmentResearch / gpt-4-novel-apis-attacks
View on GitHub
☆23Dec 28, 2023Updated 2 years ago
Aaquib111 / edge-attribution-patching
View on GitHub
Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"
☆47May 31, 2024Updated last year
longtermrisk / openweights
View on GitHub
A python sdk for LLM finetuning and inference on runpod infrastructure
☆19Feb 16, 2026Updated last week
saprmarks / feature-circuits
View on GitHub
☆209Oct 14, 2025Updated 4 months ago
TransformerLensOrg / TransformerLens
View on GitHub
A library for mechanistic interpretability of GPT-style language models
☆3,112Updated this week
nrimsky / LM-exp
View on GitHub
LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces
☆101Sep 21, 2023Updated 2 years ago
Phylliida / MambaLens
View on GitHub
Mamba support for transformer lens
☆19Sep 17, 2024Updated last year
nickkeesG / Pantheon
View on GitHub
Experimental LLM interface exploring new ways to use AI to improve human thinking
☆19Updated this week
FlyingPumba / InterpBench
View on GitHub
A benchmark for mechanistic discovery of circuits in Transformers
☆16Dec 15, 2024Updated last year
wesg52 / universal-neurons
View on GitHub
Universal Neurons in GPT2 Language Models
☆30May 28, 2024Updated last year
decoderesearch / SAELens
View on GitHub
Training Sparse Autoencoders on Language Models
☆1,219Updated this week
callummcdougall / ARENA_2.0
View on GitHub
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
☆238Aug 11, 2025Updated 6 months ago
ArthurConmy / Automatic-Circuit-Discovery
View on GitHub
☆271Oct 1, 2024Updated last year
jqhoogland / autoanki
View on GitHub
Automatically create Anki cards from text using language models
☆20Jan 7, 2023Updated 3 years ago
adamimos / epsilon-transformers
View on GitHub
epsilon machines and transformers!
☆34Jul 9, 2025Updated 7 months ago
bilal-chughtai / rep-theory-mech-interp
View on GitHub
☆28May 4, 2023Updated 2 years ago
jiahai-feng / binding-iclr
View on GitHub
☆19Mar 5, 2024Updated last year
UlisseMini / procgen-tools
View on GitHub
Tools for running experiments on RL agents in procgen environments
☆20Apr 5, 2024Updated last year
TomFrederik / unseal
View on GitHub
Mechanistic Interpretability for Transformer Models
☆53Jun 1, 2022Updated 3 years ago
saprmarks / dictionary_learning
View on GitHub
☆396Aug 21, 2025Updated 6 months ago
anthropics / toy-models-of-superposition
View on GitHub
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆137Sep 14, 2022Updated 3 years ago