JasonGross/guarantees-based-mechanistic-interpretability

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/JasonGross/guarantees-based-mechanistic-interpretability)

JasonGross / guarantees-based-mechanistic-interpretability

☆18

Alternatives and similar repositories for guarantees-based-mechanistic-interpretability

Users that are interested in guarantees-based-mechanistic-interpretability are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

samyadeepbasu / LocoGen
View on GitHub
Localization of Knowledge in Text-to-Image Models
☆11Oct 8, 2024Updated last year
Butanium / tiny-activation-dashboard
View on GitHub
A tiny easily hackable implementation of a feature dashboard.
☆17Oct 21, 2025Updated 9 months ago
tim-lawson / mlsae
View on GitHub
Multi-Layer Sparse Autoencoders (ICLR 2025)
☆30Feb 6, 2026Updated 5 months ago
atlas-computing-org / formal-specification-ide
View on GitHub
☆19Feb 18, 2026Updated 5 months ago
neelnanda-io / Neuroscope
View on GitHub
Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons
☆14Feb 13, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ejnnr / cupbearer
View on GitHub
A library for mechanistic anomaly detection
☆22Jan 9, 2025Updated last year
ApolloResearch / apd
View on GitHub
Attribution-based Parameter Decomposition
☆35Jun 11, 2025Updated last year
fiveai / understanding_safety_finetuning
View on GitHub
Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)
☆12Oct 31, 2024Updated last year
aaronmueller / MIB
View on GitHub
Landing page for MIB: A Mechanistic Interpretability Benchmark
☆26Aug 15, 2025Updated 11 months ago
manifoldmarkets / manifund
View on GitHub
☆13Updated this week
EleutherAI / tokengrams
View on GitHub
Efficiently computing & storing token n-grams from large corpora
☆28Jun 15, 2026Updated last month
anthropics / DecompositionFaithfulnessPaper
View on GitHub
☆33Jul 17, 2023Updated 3 years ago
Phylliida / MambaLens
View on GitHub
Mamba support for transformer lens
☆20Sep 17, 2024Updated last year
timaeus-research / devinterp
View on GitHub
Tools for studying developmental interpretability in neural networks.
☆145Apr 23, 2026Updated 3 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
hijohnnylin / neuronpedia-scorer
View on GitHub
☆17Feb 14, 2024Updated 2 years ago
AHartNtkn / Dependent-Binary-Lambda-Calculus
View on GitHub
A Dependently Typed Esolang
☆10Aug 4, 2017Updated 8 years ago
koayon / atp_star
View on GitHub
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
☆20Jan 19, 2025Updated last year
OliverEvans96 / maturin-nix-example
View on GitHub
☆15Jul 21, 2023Updated 3 years ago
jbloomAus / SAEDashboard
View on GitHub
☆109May 23, 2026Updated 2 months ago
abdoo8080 / lean-cvc5
View on GitHub
A Foreign Function Interface (FFI) to cvc5 solver in Lean.
☆25Jul 13, 2026Updated last week
CentreSecuriteIA / BELLS
View on GitHub
Benchmarks for the Evaluation of LLM Supervision
☆35Jan 19, 2026Updated 6 months ago
alirezasalemi7 / DEDR-MM-FiD
View on GitHub
the code for paper: A Symmetric Dual Encoding Dense Retrieval Framework for Knowledge-Intensive Visual Question Answering
☆14Aug 22, 2023Updated 2 years ago
tilde-research / sieve
View on GitHub
Applying SAEs for fine-grained control
☆27Dec 15, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
chasenorman / Formalized-Voting
View on GitHub
☆13Jul 24, 2021Updated 5 years ago
FlyingPumba / InterpBench
View on GitHub
A benchmark for mechanistic discovery of circuits in Transformers
☆17Dec 15, 2024Updated last year
betaboon / mobile-nixos-flake
View on GitHub
☆12Mar 23, 2024Updated 2 years ago
Mohamed-Imed-Eddine / Harmonic-NAS
View on GitHub
Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on Resource-constrained Devices (ACML 2023)
☆16May 7, 2024Updated 2 years ago
bilal-chughtai / rep-theory-mech-interp
View on GitHub
☆31May 4, 2023Updated 3 years ago
monasticacademy / logical-induction
View on GitHub
Code to support the guide to logical induction for software engineers
☆11Jul 12, 2026Updated last week
apartresearch / DarkBench
View on GitHub
Benchmarking Dark Patterns in LLMs (ICLR 2025)
☆18Mar 29, 2025Updated last year
blei-lab / circuitry
View on GitHub
☆16Oct 30, 2024Updated last year
noanabeshima / tinymodel
View on GitHub
A TinyStories LM with SAEs and transcoders
☆14Apr 3, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
jorispos / ConceptorSteering
View on GitHub
☆16Mar 13, 2025Updated last year
EleutherAI / deep-ignorance
View on GitHub
☆20Jan 7, 2026Updated 6 months ago
uemurax / morg
View on GitHub
Organize mathematical thoughts
☆20Oct 6, 2023Updated 2 years ago
annahdo / implementing_activation_steering
View on GitHub
A collection of different ways to implement accessing and modifying internal model activations for LLMs
☆24Oct 18, 2024Updated last year
ethz-spylab / unlearning-vs-safety
View on GitHub
☆27Oct 6, 2024Updated last year
slavachalnev / SAE-TS
View on GitHub
Improving Steering Vectors by Targeting Sparse Autoencoder Features
☆29Nov 20, 2024Updated last year
noranta4 / ASIF
View on GitHub
Personal implementation of ASIF by Antonio Norelli
☆26May 24, 2024Updated 2 years ago