aaronmueller/MIB

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/aaronmueller/MIB)

aaronmueller / MIB

Landing page for MIB: A Mechanistic Interpretability Benchmark

☆26

Alternatives and similar repositories for MIB

Users that are interested in MIB are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

hannamw / MIB-circuit-track
View on GitHub
☆24Jun 30, 2025Updated last year
goodfire-ai / causalab
View on GitHub
☆104Updated this week
TransluceAI / circuits
View on GitHub
ADAG: Transluce's MLP neuron-level circuit tracing library
☆33Apr 10, 2026Updated 3 months ago
DFKI-NLP / LLMCheckup
View on GitHub
Code for the NAACL 2024 HCI+NLP Workshop paper "LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tool…
☆13Mar 24, 2024Updated 2 years ago
JasonGross / guarantees-based-mechanistic-interpretability
View on GitHub
☆18Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Nix07 / finetuning
View on GitHub
This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…
☆32Oct 27, 2025Updated 8 months ago
PAIR-code / pretraining-tda
View on GitHub
☆33Feb 11, 2025Updated last year
goodfire-ai / scribe-task-suite
View on GitHub
A suite of interpretability tasks to evaluate agents using Scribe for notebook access
☆18Oct 2, 2025Updated 9 months ago
lacoco-lab / decompiling_transformers
View on GitHub
Repo for Paper: Discovering Interpretable Algorithms by Decompiling Transformers to RASP
☆15May 25, 2026Updated last month
skgabriel / mrf-modeling
View on GitHub
Data and models for Misinfo Reaction Frames paper.
☆14Jun 9, 2024Updated 2 years ago
cadentj / caft
View on GitHub
☆25Mar 30, 2026Updated 3 months ago
ekinakyurek / influence
View on GitHub
Code for "Tracing Knowledge in Language Models Back to the Training Data"
☆40Dec 27, 2022Updated 3 years ago
UFO-101 / auto-circuit
View on GitHub
A library for efficient patching and automatic circuit discovery.
☆99Dec 31, 2025Updated 6 months ago
microsoft / implicitMemory
View on GitHub
☆19Feb 12, 2026Updated 5 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
abhi1nandy2 / yesbut_dataset
View on GitHub
YesBut - Multimodal Satire Comprehension Dataset
☆19Oct 23, 2024Updated last year
dayeonki / mt_feedback
View on GitHub
Code for "Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations" [NAACL Findings 2024]
☆14Apr 3, 2026Updated 3 months ago
allenai / few_shot_explanations
View on GitHub
Code for NAACL 2022 paper "Reframing Human-AI Collaboration for Generating Free-Text Explanations"
☆29Apr 28, 2023Updated 3 years ago
riccardotommasini / imkg
View on GitHub
The Internet Memes Knowledge Graph
☆18Oct 18, 2024Updated last year
vistec-AI / model-releases
View on GitHub
☆14Jun 22, 2020Updated 6 years ago
mlbio-epfl / Aristotelian
View on GitHub
We revisit the Platonic Representation Hypothesis using calibrated representational similarity metrics with statistical guarantees.
☆36Jun 24, 2026Updated 3 weeks ago
ckkissane / crosscoder-model-diff-replication
View on GitHub
Open source replication of Anthropic's Crosscoders for Model Diffing
☆68Oct 27, 2024Updated last year
Multi-Agent-Security-Initiative / thought_virus
View on GitHub
☆32May 29, 2026Updated last month
acl-org / ethics-reading-list
View on GitHub
A list of ethics related resources for researchers and practitioners of Natural Language Processing and Computational Linguistics
☆34Oct 20, 2025Updated 9 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
PluviophileYU / CVC-QA
View on GitHub
Code for "Counterfactual Variable Control for Robust and Interpretable Question Answering"
☆14Oct 13, 2020Updated 5 years ago
FarnoushRJ / RelP
View on GitHub
[NeurIPS 2025 MechInterp Workshop - Spotlight] Official implementation of the paper "RelP: Faithful and Efficient Circuit Discovery in La…
☆29Nov 3, 2025Updated 8 months ago
jbloomAus / SAEDashboard
View on GitHub
☆109May 23, 2026Updated last month
ml-jku / SE-RRM
View on GitHub
Symbol-Equivariant Recurrent Reasoning Model
☆16Mar 4, 2026Updated 4 months ago
apartresearch / specificityplus
View on GitHub
👩‍💻 Code for the ACL paper "Detecting Edit Failures in LLMs: An Improved Specificity Benchmark"
☆20Jan 19, 2024Updated 2 years ago
hybrid-intelligence / SHARPIE
View on GitHub
SHARPIE: Shared Human-AI Reinforcement Learning Platform for Interactive Experiments
☆22Updated this week
ShiJiawenwen / JudgeDeceiver
View on GitHub
[CCS 2024] Optimization-based Prompt Injection Attack to LLM-as-a-Judge
☆41Sep 17, 2025Updated 10 months ago
uclanlp / synpg
View on GitHub
Code for our EACL-2021 paper "Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs".
☆38Jun 24, 2024Updated 2 years ago
tilde-research / activault
View on GitHub
Engine for collecting, uploading, and downloading model activations
☆30Apr 2, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
firetto / Walnut
View on GitHub
☆12Mar 31, 2024Updated 2 years ago
nitikam / tangled
View on GitHub
Code, data, and additional analysis for the paper Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evalua…
☆15Aug 13, 2020Updated 5 years ago
technion-cs-nlp / parametric-faithfulness
View on GitHub
☆23Aug 30, 2025Updated 10 months ago
AndreasMadsen / nlp-roar-interpretability
View on GitHub
Measuring if attention is explanation with ROAR
☆22Mar 3, 2023Updated 3 years ago
EleutherAI / attribute
View on GitHub
☆16Nov 14, 2025Updated 8 months ago
interpretingdl / eacl2024_transformer_interpretability_tutorial
View on GitHub
Materials for EACL2024 tutorial: Transformer-specific Interpretability
☆66Mar 26, 2024Updated 2 years ago
wannaphong / IsanNLP
View on GitHub
Isan NLP
☆17Mar 27, 2024Updated 2 years ago