Landing page for MIB: A Mechanistic Interpretability Benchmark
☆24Aug 15, 2025Updated 6 months ago
Alternatives and similar repositories for MIB
Users that are interested in MIB are comparing it to the libraries listed below
Sorting:
- ☆23Jun 30, 2025Updated 8 months ago
- ☆32Feb 15, 2026Updated 2 weeks ago
- Code for the NAACL 2024 HCI+NLP Workshop paper "LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tool…☆13Mar 24, 2024Updated last year
- ☆17Aug 30, 2025Updated 6 months ago
- ☆24Oct 3, 2025Updated 4 months ago
- [NeurIPS 2025 MechInterp Workshop - Spotlight] Official implementation of the paper "RelP: Faithful and Efficient Circuit Discovery in La…☆27Nov 3, 2025Updated 3 months ago
- ☆22Feb 13, 2026Updated 2 weeks ago
- (NAACL 2024) Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations☆15Apr 14, 2025Updated 10 months ago
- ☆33Jul 9, 2025Updated 7 months ago
- ☆17Updated this week
- 👩💻 Code for the ACL paper "Detecting Edit Failures in LLMs: An Improved Specificity Benchmark"☆20Jan 19, 2024Updated 2 years ago
- Measuring if attention is explanation with ROAR☆22Mar 3, 2023Updated 2 years ago
- Efficiently computing & storing token n-grams from large corpora☆26Oct 6, 2024Updated last year
- This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…☆30Oct 27, 2025Updated 4 months ago
- A list of ethics related resources for researchers and practitioners of Natural Language Processing and Computational Linguistics☆33Oct 20, 2025Updated 4 months ago
- ☆24May 15, 2021Updated 4 years ago
- Code for "Don't trust your eyes: on the (un)reliability of feature visualizations" (ICML 2024)☆34Nov 15, 2023Updated 2 years ago
- ☆32Feb 11, 2025Updated last year
- ☆30Feb 11, 2022Updated 4 years ago
- ☆29Jan 12, 2026Updated last month
- ☆150Dec 30, 2025Updated 2 months ago
- ☆13Oct 5, 2025Updated 4 months ago
- A library for efficient patching and automatic circuit discovery.☆90Dec 31, 2025Updated 2 months ago
- Attribution-based Parameter Decomposition☆34Jun 11, 2025Updated 8 months ago
- Code for NAACL 2022 paper "Reframing Human-AI Collaboration for Generating Free-Text Explanations"☆31Apr 28, 2023Updated 2 years ago
- ☆140Aug 4, 2024Updated last year
- ☆83Feb 25, 2025Updated last year
- 🪝PISCES - Precise In-Parameter Suppression for Concept EraSure in Large Language Models☆12May 30, 2025Updated 9 months ago
- How much is the footprint of a piece of software? This script scans the process statistics for the appearance of a given command name and…☆12Nov 16, 2023Updated 2 years ago
- Arabic News Stance Corpus☆11Feb 5, 2021Updated 5 years ago
- Code and Data for Evaluation WG☆42May 4, 2022Updated 3 years ago
- Code for our EACL-2021 paper "Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs".☆38Jun 24, 2024Updated last year
- Open source interpretability artefacts for R1.☆171Apr 21, 2025Updated 10 months ago
- A framework for evaluating Machine Translation models.☆12May 26, 2025Updated 9 months ago
- Implementing LRP (Layer-wise Relevance Propagation) for a sequence-to-sequence model with GRU layers.☆12Sep 8, 2023Updated 2 years ago
- DL Backtrace is a new explainablity technique for deep learning models that works for any modality and model type.☆23Feb 16, 2026Updated 2 weeks ago
- ☆12Aug 15, 2023Updated 2 years ago
- Residual Quantization Autoencoder, used for interpreting LLMs☆14Jan 1, 2025Updated last year
- Code for "Tracing Knowledge in Language Models Back to the Training Data"☆39Dec 27, 2022Updated 3 years ago