hijohnnylin/automated-interpretability

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/hijohnnylin/automated-interpretability)

hijohnnylin / automated-interpretability

☆22

Alternatives and similar repositories for automated-interpretability

Users that are interested in automated-interpretability are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

noanabeshima / tinymodel
View on GitHub
A TinyStories LM with SAEs and transcoders
☆14Apr 3, 2025Updated last year
neelnanda-io / Neuroscope
View on GitHub
Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons
☆13Feb 13, 2023Updated 3 years ago
ckkissane / sae-transfer
View on GitHub
Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"
☆13Jul 18, 2024Updated last year
rloganiv / kglm-data
View on GitHub
Code used to create the Linked WikiText-2 dataset
☆16May 22, 2023Updated 2 years ago
curt-tigges / crosslayer-coding
View on GitHub
☆17Jul 9, 2025Updated 9 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
science-of-finetuning / diffing-toolkit
View on GitHub
A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.
☆71Updated this week
cvenhoff / steering-thinking-llms
View on GitHub
☆34Jul 9, 2025Updated 9 months ago
vandium-io / aws-param-env
View on GitHub
☆20Dec 30, 2022Updated 3 years ago
tilde-research / sieve
View on GitHub
Applying SAEs for fine-grained control
☆26Dec 15, 2024Updated last year
bartbussmann / BatchTopK
View on GitHub
Implementation of the BatchTopK activation function for training sparse autoencoders (SAEs)
☆63Jul 24, 2025Updated 8 months ago
ai-safety-foundation / sparse_autoencoder
View on GitHub
Sparse Autoencoder for Mechanistic Interpretability
☆296Jul 20, 2024Updated last year
decoderesearch / SAELens
View on GitHub
Training Sparse Autoencoders on Language Models
☆1,312Mar 19, 2026Updated 3 weeks ago
curt-tigges / probity
View on GitHub
☆20Apr 10, 2025Updated last year
jbloomAus / SAEDashboard
View on GitHub
☆92Dec 18, 2025Updated 3 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
lasgroup / SafetyPolytope
View on GitHub
Learning Safety Constraints for Large Language Models (ICML2025)
☆34Aug 4, 2025Updated 8 months ago
goodfire-ai / scribe
View on GitHub
☆80Feb 18, 2026Updated last month
anthropics / toy-models-of-superposition
View on GitHub
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆144Sep 14, 2022Updated 3 years ago
EleutherAI / elk-generalization
View on GitHub
Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…
☆29May 23, 2024Updated last year
aaronmueller / MIB
View on GitHub
Landing page for MIB: A Mechanistic Interpretability Benchmark
☆24Aug 15, 2025Updated 7 months ago
danielway / nexrad-volumetric-renderer
View on GitHub
Project exploring 3D volumetric rendering of NEXRAD radar data.
☆12Oct 23, 2023Updated 2 years ago
diegoinacio / svg-experiments
View on GitHub
✒️ A gallery of experiments with Scalable Vector Graphics (SVG) and interactive visualizations.
☆13Jan 6, 2023Updated 3 years ago
hijohnnylin / neuronpedia
View on GitHub
open source interpretability platform 🧠
☆782Updated this week
DavideBuffelli / SizeShiftReg
View on GitHub
Code for the paper "SizeShiftReg: a Regularization Method for Improving Size-Generalization in Graph Neural Networks"
☆12Jan 17, 2023Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
anthropics / hypercorn
View on GitHub
Hypercorn is an ASGI and WSGI Server based on Hyper libraries and inspired by Gunicorn.
☆15Jan 12, 2026Updated 3 months ago
safety-research / finetuning-auditor
View on GitHub
Auditing agents for fine-tuning safety
☆20Oct 21, 2025Updated 5 months ago
UlisseMini / ana
View on GitHub
The AI that helps you achieve your goals
☆11Feb 4, 2024Updated 2 years ago
ezyang / ai-blindspots
View on GitHub
Blindspots in LLMs I've noticed while AI coding. Sonnet family emphasis.
☆13Mar 20, 2025Updated last year
alejandro-lozano-dev / Eiffel2
View on GitHub
Neural Network architecture Visualization tool
☆13Jul 4, 2020Updated 5 years ago
Chillee / lit-llama
View on GitHub
Simple (fast) transformer inference in PyTorch with torch.compile + lit-llama code
☆10Aug 29, 2023Updated 2 years ago
anthropics / rogue-deploy-eval
View on GitHub
☆14Jan 21, 2025Updated last year
alacritty / termbenchbot
View on GitHub
Automated terminal emulator benchmarks
☆23Mar 30, 2026Updated last week
KarlXing / LUSR
View on GitHub
Domain Adaptation In Reinforcement Learning Via Latent Unified State Representation (AAAI 2021)
☆31Sep 3, 2021Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
eburghar / l3charts
View on GitHub
Customizable charts made with TikZ and LaTeX3
☆14Feb 11, 2023Updated 3 years ago
johanhelsing / bevy_touch_stick
View on GitHub
An analog touch screen joystick that pretends to be a bevy gamepad
☆13Jul 13, 2024Updated last year
KempnerInstitute / llm_uncertainty
View on GitHub
Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"
☆11Apr 15, 2024Updated last year
safety-research / open-source-alignment-faking
View on GitHub
Open Source Replication of Anthropic's Alignment Faking Paper
☆56Apr 4, 2025Updated last year
hrtan / MoSo
View on GitHub
☆10Oct 20, 2023Updated 2 years ago
shauli-ravfogel / adv-kernel-removal
View on GitHub
☆12Oct 23, 2022Updated 3 years ago
saprmarks / dictionary_learning
View on GitHub
☆408Aug 21, 2025Updated 7 months ago