rattlesnakey/Awesome-Actionable-MI-Survey

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/rattlesnakey/Awesome-Actionable-MI-Survey)

rattlesnakey / Awesome-Actionable-MI-Survey

The Github repo for our survey paper: "Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models"

☆150

Alternatives and similar repositories for Awesome-Actionable-MI-Survey

Users that are interested in Awesome-Actionable-MI-Survey are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LLM-MI-Research / Actionable-MI
View on GitHub
☆15Jan 20, 2026Updated 6 months ago
zzhang0179 / Unveiling-Linguistic-Regions-in-LLMs
View on GitHub
[ACL 2024] Unveiling Linguistic Regions in Large Language Models
☆34Jun 9, 2024Updated 2 years ago
MasterVito / DAC-RL
View on GitHub
Official Repo for DAC-RL: Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability
☆16Feb 26, 2026Updated 5 months ago
kojima-takeshi188 / lang_neuron
View on GitHub
☆21Jun 24, 2024Updated 2 years ago
SWE-Lego / SWE-Lego
View on GitHub
SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving
☆71Feb 28, 2026Updated 5 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
deeplearning-wisc / LUMINA
View on GitHub
Official implementation of ICLR 2026 paper "LUMINA: Detecting Hallucinations in RAG System with Context–Knowledge Signals"
☆18Jan 31, 2026Updated 5 months ago
IBM / activation-steering
View on GitHub
[ICLR 2025] General-purpose activation steering library
☆181Sep 18, 2025Updated 10 months ago
zepingyu0512 / awesome-llm-understanding-mechanism
View on GitHub
awesome papers in LLM interpretability
☆624Aug 20, 2025Updated 11 months ago
killthefullmoon / PhyX
View on GitHub
PhyX: Does Your Model Have the "Wits" for Physical Reasoning?
☆54Mar 16, 2026Updated 4 months ago
PKU-PILLAR-Group / Survey-Intrinsic-Interpretability-of-LLMs
View on GitHub
Paper List for our ACL 2026 paper "Towards Intrinsic Interpretability of Large Language Models: A Survey of Design Principles and Archite…
☆16Apr 23, 2026Updated 3 months ago
itsqyh / Awesome-LMMs-Mechanistic-Interpretability
View on GitHub
A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository agg…
☆215Mar 4, 2026Updated 4 months ago
UKPLab / tmlr2026-manifold-analysis
View on GitHub
☆21Mar 3, 2026Updated 4 months ago
menik1126 / UNComp
View on GitHub
[EMNLP 2025🔥] UNComp: Can Matrix Entropy Uncover Sparsity? -- A Compressor Design from an Uncertainty-Aware Perspective
☆20Jan 7, 2026Updated 6 months ago
zepingyu0512 / arithmetic-mechanism
View on GitHub
code for EMNLP 2024 paper: Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis
☆12Nov 17, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
adamkarvonen / activation_oracles
View on GitHub
☆96Apr 18, 2026Updated 3 months ago
AI4LIFE-GROUP / temporal-saes
View on GitHub
Codebase for Temporal SAEs paper
☆24Nov 14, 2025Updated 8 months ago
ZJU-REAL / EasySteer
View on GitHub
A Unified Framework for High-Performance and Extensible LLM Steering
☆288Apr 30, 2026Updated 2 months ago
zepingyu0512 / neuron-attribution
View on GitHub
code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models
☆52Nov 17, 2024Updated last year
AI-in-Transportation-Lab / awesome-mechanistic-interpretability
View on GitHub
A carefully curated collection of high-quality libraries, projects, tutorials, research papers, and other essential resources focused on …
☆130Updated this week
AlphaLab-USTC / AlphaSteer
View on GitHub
[ICLR 2026] The implementation of paper "AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint"
☆61Nov 20, 2025Updated 8 months ago
cvenhoff / vlm-mapping
View on GitHub
☆19Jun 20, 2025Updated last year
MaheepChaudhary / SAE-Ravel
View on GitHub
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…
☆13Jan 26, 2025Updated last year
cunliangkong / linux-envs
View on GitHub
personal settings for linux tools, including zsh, vim, tmux, pip.
☆11Dec 2, 2019Updated 6 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
TransformerLensOrg / TransformerLens
View on GitHub
A library for mechanistic interpretability of GPT-style language models
☆3,723Updated this week
toltoxgh / CoreNLP-jMWE
View on GitHub
Stanford CoreNLP annotator implementing jMWE for detecting Multi-Word Expressions / collocations
☆15Jan 6, 2017Updated 9 years ago
huhailinguist / ArguGPT
View on GitHub
☆22Sep 25, 2023Updated 2 years ago
decoderesearch / SAELens
View on GitHub
Training Sparse Autoencoders on Language Models
☆1,487Updated this week
shengliu66 / FractionalReason
View on GitHub
Official github repo for "Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute"
☆17Jun 30, 2025Updated last year
jkutaso / SHADE-Arena
View on GitHub
☆57May 9, 2025Updated last year
CMarsRover / SciAgentGYM
View on GitHub
Code for Paper: Benchmarking Multi-step Scientific Tool-use in LLM Agents
☆37Jul 5, 2026Updated 3 weeks ago
Shwai-He / PAD-Net
View on GitHub
Source code of ACL 2023 Main Conference Paper "PAD-Net: An Efficient Framework for Dynamic Networks".
☆14Feb 28, 2026Updated 5 months ago
january-blue / OpenNovelty
View on GitHub
☆135May 12, 2026Updated 2 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
SKURA502 / sae-analysis
View on GitHub
A toolkit for systematically understanding the concepts encoded in Sparse Autoencoders.
☆20Apr 5, 2026Updated 3 months ago
liusida / ica-lens-paper
View on GitHub
ICA Lens: compact ICA-based interpretability tools for exploring LLM activations. Code release for the paper.
☆38Jul 5, 2026Updated 3 weeks ago
dadelani / sib-200
View on GitHub
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects
☆26May 20, 2026Updated 2 months ago
ASTRAL-Group / MonitorBench
View on GitHub
[COLM 2026] Official implementation for "MonitorBench: A Comprehensive Benchmark for Chain-of-Thought Monitorability in Large Language Mo…
☆20Apr 23, 2026Updated 3 months ago
r-three / AttriBoT
View on GitHub
Code for AttriBoT from "AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution"
☆15Apr 21, 2025Updated last year
AheadOFpotato / Awesome-LRM-Mechanisms
View on GitHub
Towards a Mechanistic Understanding of Large Reasoning Models: A Survey of Training, Inference, and Failures
☆34Jan 29, 2026Updated 6 months ago
icip-cas / ReasoningLens
View on GitHub
ReasoningLens: a user-friendly toolkit to visualize, understand, and debug model reasoning chains.
☆25Jul 7, 2026Updated 3 weeks ago