hijohnnylin/neuronpedia

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/hijohnnylin/neuronpedia)

hijohnnylin / neuronpedia

open source interpretability platform 🧠

☆1,086

Alternatives and similar repositories for neuronpedia

Users that are interested in neuronpedia are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

decoderesearch / SAELens
View on GitHub
Training Sparse Autoencoders on Language Models
☆1,485Updated this week
decoderesearch / circuit-tracer
View on GitHub
☆2,875Jul 18, 2026Updated last week
TransformerLensOrg / TransformerLens
View on GitHub
A library for mechanistic interpretability of GPT-style language models
☆3,716Updated this week
ndif-team / nnsight
View on GitHub
The nnsight package enables interpreting and manipulating the internals of deep learned models.
☆1,000Updated this week
TransformerLensOrg / CircuitsVis
View on GitHub
Mechanistic Interpretability Visualizations using React
☆358Apr 30, 2026Updated 2 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
jbloomAus / SAEDashboard
View on GitHub
☆109May 23, 2026Updated 2 months ago
EleutherAI / delphi
View on GitHub
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆268Updated this week
decoderesearch / automated-interpretability
View on GitHub
☆24Feb 13, 2026Updated 5 months ago
adamkarvonen / SAEBench
View on GitHub
☆178May 1, 2026Updated 2 months ago
saprmarks / dictionary_learning
View on GitHub
☆428Aug 21, 2025Updated 11 months ago
kitft / natural_language_autoencoders
View on GitHub
☆909Jun 9, 2026Updated last month
ajobi-uhc / seer
View on GitHub
This was designed for interp researchers who want to do research on or with interp agents to give quality of life improvements and fix …
☆146Feb 8, 2026Updated 5 months ago
callummcdougall / ARENA_3.0
View on GitHub
☆1,190Updated this week
goodfire-ai / param-decomp
View on GitHub
Parameter Decomposition
☆136Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
curt-tigges / crosslayer-coding
View on GitHub
☆18Jul 9, 2025Updated last year
ndif-team / nnterp
View on GitHub
Unified access to Large Language Model modules using NNsight
☆116Updated this week
EleutherAI / sparsify
View on GitHub
Sparsify transformers with SAEs and transcoders
☆734Updated this week
jacobdunefsky / transcoder_circuits
View on GitHub
☆212Nov 17, 2024Updated last year
safety-research / persona_vectors
View on GitHub
Persona Vectors: Monitoring and Controlling Character Traits in Language Models
☆452Apr 22, 2026Updated 3 months ago
goodfire-ai / sdxl-turbo-interpretability
View on GitHub
☆49May 27, 2025Updated last year
goodfire-ai / r1-interpretability
View on GitHub
Open source interpretability artefacts for R1.
☆183Apr 21, 2025Updated last year
EleutherAI / attribute
View on GitHub
☆16Nov 14, 2025Updated 8 months ago
callummcdougall / sae_vis
View on GitHub
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆267Feb 27, 2026Updated 4 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
meridianlabs-ai / inspect_petri
View on GitHub
An alignment auditing agent capable of quickly exploring alignment hypothesis
☆1,274Updated this week
saprmarks / feature-circuits
View on GitHub
☆223Oct 14, 2025Updated 9 months ago
OpenMOSS / Llamascopium
View on GitHub
Performant framework for training, analyzing and visualizing Sparse Autoencoders (SAEs) and their frontier variants.
☆223Updated this week
interp-reasoning / thought-anchors
View on GitHub
⚓️ Repository for the "Thought Anchors: Which LLM Reasoning Steps Matter?" paper.
☆137Oct 27, 2025Updated 8 months ago
nickjiang2378 / interp-embed
View on GitHub
A toolkit for embedding text datasets with sparse autoencoders
☆30Mar 24, 2026Updated 4 months ago
anthropics / jacobian-lens
View on GitHub
Companion code for the global workspace interpretability paper
☆1,578Updated this week
UKGovernmentBEIS / control-arena
View on GitHub
ControlArena is a collection of settings, model organisms and protocols - for running control experiments.
☆213Updated this week
stanfordnlp / axbench
View on GitHub
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
☆210Mar 12, 2026Updated 4 months ago
TransluceAI / observatory
View on GitHub
A toolkit for describing model features and intervening on those features to steer behavior.
☆250Mar 16, 2026Updated 4 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Dakingrai / awesome-mechanistic-interpretability-lm-papers
View on GitHub
☆260Nov 22, 2024Updated last year
UFO-101 / auto-circuit
View on GitHub
A library for efficient patching and automatic circuit discovery.
☆99Dec 31, 2025Updated 6 months ago
UKGovernmentBEIS / inspect_ai
View on GitHub
Inspect: A framework for large language model evaluations
☆2,411Updated this week
openai / sparse_autoencoder
View on GitHub
☆597Jul 19, 2024Updated 2 years ago
EleutherAI / clt-training
View on GitHub
Sparsify transformers with cross-layer transcoders
☆26Nov 14, 2025Updated 8 months ago
adamkarvonen / activation_oracles
View on GitHub
☆95Apr 18, 2026Updated 3 months ago
safety-research / assistant-axis
View on GitHub
The Assistant Axis is a direction in activation space that captures how "Assistant-like" a model's behavior is. Models can drift away fro…
☆158Jan 20, 2026Updated 6 months ago