goodfire-ai/causalab

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/goodfire-ai/causalab)

goodfire-ai / causalab

☆73

Alternatives and similar repositories for causalab

Users that are interested in causalab are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

hannamw / MIB-circuit-track
View on GitHub
☆24Jun 30, 2025Updated 10 months ago
aaronmueller / MIB
View on GitHub
Landing page for MIB: A Mechanistic Interpretability Benchmark
☆25Aug 15, 2025Updated 9 months ago
tim-lawson / mlsae
View on GitHub
Multi-Layer Sparse Autoencoders (ICLR 2025)
☆30Feb 6, 2026Updated 3 months ago
explanare / ravel
View on GitHub
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
☆58Oct 30, 2025Updated 6 months ago
dtch1997 / steering-bench
View on GitHub
Official codebase for "Analyzing the Generalization and Reliability of Steering Vectors"
☆21Dec 14, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
duykhuongnguyen / MAT-Steer
View on GitHub
☆19Aug 19, 2025Updated 9 months ago
ckkissane / sae-transfer
View on GitHub
Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"
☆13Jul 18, 2024Updated last year
s-ball-10 / jailbreak_dynamics
View on GitHub
☆25Jun 13, 2024Updated last year
noanabeshima / matryoshka-saes
View on GitHub
☆28Nov 28, 2024Updated last year
ckkissane / crosscoder-model-diff-replication
View on GitHub
Open source replication of Anthropic's Crosscoders for Model Diffing
☆66Oct 27, 2024Updated last year
tilde-research / sieve
View on GitHub
Applying SAEs for fine-grained control
☆27Dec 15, 2024Updated last year
yiksiu-chan / SpeakEasy
View on GitHub
[ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
☆14Mar 7, 2026Updated 2 months ago
saprmarks / geometry-of-truth
View on GitHub
☆107Aug 8, 2024Updated last year
UFO-101 / auto-circuit
View on GitHub
A library for efficient patching and automatic circuit discovery.
☆97Dec 31, 2025Updated 4 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
johnnovak / twyg
View on GitHub
Generative tree visualiser for Python
☆16Sep 15, 2020Updated 5 years ago
neelnanda-io / Grokking
View on GitHub
A Mechanistic Interpretability Analysis of Grokking
☆27Sep 26, 2022Updated 3 years ago
aryamanarora / causalgym
View on GitHub
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
☆53Nov 30, 2024Updated last year
vedantpalit / Towards-Vision-Language-Mechanistic-Interpretability
View on GitHub
This is the official repository for the "Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP" paper acce…
☆25Feb 16, 2026Updated 3 months ago
abhi1nandy2 / yesbut_dataset
View on GitHub
YesBut - Multimodal Satire Comprehension Dataset
☆19Oct 23, 2024Updated last year
koayon / atp_star
View on GitHub
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
☆20Jan 19, 2025Updated last year
JunsolKim / RepresentationPoliticalLLM
View on GitHub
Kim, J., Evans, J., & Schein, A. (2025). Linear Representations of Political Perspective Emerge in Large Language Models. ICLR.
☆25Mar 27, 2025Updated last year
kheilbron / caldera
View on GitHub
Calling disease-related genes
☆16Apr 1, 2026Updated last month
erdogant / d3heatmap
View on GitHub
d3heatmap is a Python package to create interactive heatmaps based on d3js.
☆11Sep 14, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
aypan17 / latentqa
View on GitHub
☆33Nov 16, 2025Updated 6 months ago
andyrdt / refusal_direction
View on GitHub
Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".
☆390Jun 13, 2025Updated 11 months ago
nilenso / vyakaran
View on GitHub
the handbook for nilenso
☆14Feb 6, 2024Updated 2 years ago
yihui / xran
View on GitHub
Xie's R Archive Network (experimental and for my personal interest only)
☆26Sep 6, 2021Updated 4 years ago
riccardotommasini / imkg
View on GitHub
The Internet Memes Knowledge Graph
☆18Oct 18, 2024Updated last year
tigerchen52 / GLADIS
View on GitHub
GLADIS: A General and Large Acronym Disambiguation Benchmark (EACL 23)
☆18Jun 24, 2024Updated last year
velten-group / crispat
View on GitHub
☆19Apr 4, 2025Updated last year
jaehunjung1 / impossible-distillation
View on GitHub
☆18Jul 3, 2024Updated last year
curt-tigges / probity
View on GitHub
☆19Apr 10, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
wikilinks / conll03_nel_eval
View on GitHub
Python evaluation scripts for AIDA-formatted CoNLL data
☆20Aug 4, 2014Updated 11 years ago
nerdslab / SwapVAE
View on GitHub
PyTorch implementation of Swap-VAE: A self-supervised approach for generating neural activity
☆13Nov 17, 2021Updated 4 years ago
paolobrasolin / string-diagrams
View on GitHub
Create string diagrams with LaTeX!
☆14Jan 3, 2025Updated last year
montemac / activation_additions
View on GitHub
Algebraic value editing in pretrained language models
☆70Nov 1, 2023Updated 2 years ago
ninell-oldenburg / social-contracts
View on GitHub
☆13Mar 12, 2024Updated 2 years ago
mlepori1 / NeuroSurgeon
View on GitHub
NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformers
☆43Feb 12, 2025Updated last year
TransluceAI / jailbreaking-frontier-models
View on GitHub
☆26Sep 3, 2025Updated 8 months ago