FlyingPumba/InterpBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/FlyingPumba/InterpBench)

FlyingPumba / InterpBench

A benchmark for mechanistic discovery of circuits in Transformers

☆17

Alternatives and similar repositories for InterpBench

Users that are interested in InterpBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ag8 / sha-transformer
View on GitHub
☆12Jul 8, 2024Updated 2 years ago
koayon / atp_star
View on GitHub
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
☆20Jan 19, 2025Updated last year
MaheepChaudhary / SAE-Ravel
View on GitHub
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…
☆13Jan 26, 2025Updated last year
cvndsh / rebus
View on GitHub
REBUS: A Robust Evaluation Benchmark of Understanding Symbols
☆13Aug 13, 2024Updated last year
ejnnr / cupbearer
View on GitHub
A library for mechanistic anomaly detection
☆22Jan 9, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ordavid-s / snmf-mlp-decomposition
View on GitHub
☆15Jul 7, 2026Updated last week
lacoco-lab / decompiling_transformers
View on GitHub
Repo for Paper: Discovering Interpretable Algorithms by Decompiling Transformers to RASP
☆15May 25, 2026Updated last month
alan-cooney / transformer-from-scratch
View on GitHub
Decoder only transformer, built from scratch with PyTorch
☆33Oct 22, 2023Updated 2 years ago
fiveai / understanding_safety_finetuning
View on GitHub
Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)
☆12Oct 31, 2024Updated last year
ArthurConmy / Automatic-Circuit-Discovery
View on GitHub
☆293Oct 1, 2024Updated last year
Aaquib111 / edge-attribution-patching
View on GitHub
Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"
☆48May 31, 2024Updated 2 years ago
hannamw / EAP-IG
View on GitHub
☆81May 23, 2026Updated last month
ApolloResearch / apd
View on GitHub
Attribution-based Parameter Decomposition
☆35Jun 11, 2025Updated last year
explanare / ravel
View on GitHub
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
☆58Oct 30, 2025Updated 8 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
EleutherAI / mdl
View on GitHub
Minimum Description Length probing for neural network representations
☆20Jan 28, 2025Updated last year
jorispos / ConceptorSteering
View on GitHub
☆16Mar 13, 2025Updated last year
aquelemiguel / geoguessr-plus
View on GitHub
🌍 A sleek Chrome extension that enhances your Geoguessr experience with advanced round info and more.
☆10Apr 18, 2021Updated 5 years ago
UFO-101 / auto-circuit
View on GitHub
A library for efficient patching and automatic circuit discovery.
☆99Dec 31, 2025Updated 6 months ago
Butanium / tiny-activation-dashboard
View on GitHub
A tiny easily hackable implementation of a feature dashboard.
☆17Oct 21, 2025Updated 8 months ago
redwoodresearch / interp
View on GitHub
Redwood Research's transformer interpretability tools
☆15Apr 15, 2022Updated 4 years ago
Blkalkin / Optimal-TestTime
View on GitHub
☆10Mar 24, 2025Updated last year
EleutherAI / elk-generalization
View on GitHub
Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…
☆33May 23, 2024Updated 2 years ago
DanielSc4 / Dynamic-Activation-Composition
View on GitHub
Materials for "Multi-property Steering of Large Language Models with Dynamic Activation Composition"
☆14Nov 22, 2024Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
efarrell1 / train_sparse_autoencoder
View on GitHub
Trains Sparse Autoencoders based on outputs from language models
☆11Oct 7, 2024Updated last year
princeton-nlp / Edge-Pruning
View on GitHub
[NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".
☆70Aug 15, 2025Updated 11 months ago
ckkissane / crosscoder-model-diff-replication
View on GitHub
Open source replication of Anthropic's Crosscoders for Model Diffing
☆68Oct 27, 2024Updated last year
srush / Tensor-Puzzles-Penzai
View on GitHub
☆22Apr 22, 2024Updated 2 years ago
LukeBailey181 / obfuscated-activations
View on GitHub
Codebase for Obfuscated Activations Bypass LLM Latent-Space Defenses
☆31Feb 11, 2025Updated last year
goodfire-ai / scribe-task-suite
View on GitHub
A suite of interpretability tasks to evaluate agents using Scribe for notebook access
☆18Oct 2, 2025Updated 9 months ago
TeunvdWeij / sandbagging
View on GitHub
☆20Nov 15, 2024Updated last year
firubii / KirbyYAML
View on GitHub
A YAML editor for the modern Kirby games
☆13Aug 23, 2025Updated 10 months ago
ALT-JS / OthelloSAE
View on GitHub
CS194-196 Course Project
☆14Feb 20, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
siefkenj / 2020-MAT-335-webpage
View on GitHub
Course webpage for MAT335 at the University of Toronto
☆13Apr 3, 2020Updated 6 years ago
aadityasingh / icl-dynamics
View on GitHub
☆26Feb 20, 2026Updated 5 months ago
GangweiJiang / FvForgetting
View on GitHub
☆15Apr 20, 2025Updated last year
firubii / JamBuilder
View on GitHub
A WIP level editor for Kirby: Star Allies
☆10Dec 8, 2022Updated 3 years ago
RobertCsordas / onion_representations
View on GitHub
☆13Aug 19, 2024Updated last year
JasonGross / guarantees-based-mechanistic-interpretability
View on GitHub
☆18Updated this week
EleutherAI / training-jacobian
View on GitHub
☆24Dec 11, 2024Updated last year