zepingyu0512/awesome-llm-understanding-mechanism

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zepingyu0512/awesome-llm-understanding-mechanism)

zepingyu0512 / awesome-llm-understanding-mechanism

awesome papers in LLM interpretability

☆623

Alternatives and similar repositories for awesome-llm-understanding-mechanism

Users that are interested in awesome-llm-understanding-mechanism are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zepingyu0512 / awesome-SAE
View on GitHub
awesome SAE papers
☆78May 24, 2025Updated last year
zepingyu0512 / awesome-LLM-neuron
View on GitHub
☆36Jun 13, 2025Updated last year
cooperleong00 / Awesome-LLM-Interpretability
View on GitHub
A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..
☆308Jan 22, 2026Updated 5 months ago
zepingyu0512 / neuron-attribution
View on GitHub
code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models
☆52Nov 17, 2024Updated last year
Dakingrai / awesome-mechanistic-interpretability-lm-papers
View on GitHub
☆259Nov 22, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ruizheliUOA / Awesome-Interpretability-in-Large-Language-Models
View on GitHub
This repository collects all relevant resources about interpretability in LLMs
☆402Nov 1, 2024Updated last year
JShollaj / awesome-llm-interpretability
View on GitHub
A curated list of Large Language Model (LLM) Interpretability resources.
☆1,629Feb 24, 2026Updated 4 months ago
TransformerLensOrg / TransformerLens
View on GitHub
A library for mechanistic interpretability of GPT-style language models
☆3,695Updated this week
itsqyh / Awesome-LMMs-Mechanistic-Interpretability
View on GitHub
A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository agg…
☆215Mar 4, 2026Updated 4 months ago
zjunlp / KnowledgeEditingPapers
View on GitHub
Must-read Papers on Knowledge Editing for Large Language Models.
☆1,238Jun 25, 2026Updated 3 weeks ago
IAAR-Shanghai / Awesome-Attention-Heads
View on GitHub
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
☆412Mar 2, 2025Updated last year
LLM-MI-Research / Actionable-MI
View on GitHub
☆15Jan 20, 2026Updated 6 months ago
ZFancy / awesome-activation-engineering
View on GitHub
A curated list of resources for activation engineering
☆140Oct 2, 2025Updated 9 months ago
decoderesearch / SAELens
View on GitHub
Training Sparse Autoencoders on Language Models
☆1,477Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
zjunlp / EasyEdit
View on GitHub
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
☆2,881Jul 14, 2026Updated last week
HoagyC / sparse_coding
View on GitHub
Using sparse coding to find distributed representations used by neural networks.
☆306Nov 10, 2023Updated 2 years ago
rattlesnakey / Awesome-Actionable-MI-Survey
View on GitHub
The Github repo for our survey paper: "Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large…
☆149Apr 15, 2026Updated 3 months ago
saprmarks / feature-circuits
View on GitHub
☆223Oct 14, 2025Updated 9 months ago
zepingyu0512 / arithmetic-mechanism
View on GitHub
code for EMNLP 2024 paper: Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis
☆12Nov 17, 2024Updated last year
hannamw / eap-ig-faithfulness
View on GitHub
Code for "Automatic Circuit Finding and Faithfulness"
☆18Jul 11, 2024Updated 2 years ago
chrisliu298 / awesome-representation-engineering
View on GitHub
A resource repository for representation engineering in large language models
☆156Nov 14, 2024Updated last year
likenneth / honest_llama
View on GitHub
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
☆581Jan 28, 2025Updated last year
ydyjya / LLM-IHS-Explanation
View on GitHub
☆60Jun 13, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
edenbiran / HoppingTooLate
View on GitHub
Exploring the Limitations of Large Language Models on Multi-Hop Queries
☆33Mar 2, 2025Updated last year
ericwtodd / function_vectors
View on GitHub
Function Vectors in Large Language Models (ICLR 2024)
☆199Apr 30, 2026Updated 2 months ago
lone17 / angular-steering
View on GitHub
[WIP] [NeurIPS 2025 Spotlight] Angular Steering: Behavior Control via Rotation in Activation Space
☆25May 25, 2026Updated last month
interpretingdl / eacl2024_transformer_interpretability_tutorial
View on GitHub
Materials for EACL2024 tutorial: Transformer-specific Interpretability
☆66Mar 26, 2024Updated 2 years ago
EleutherAI / sparsify
View on GitHub
Sparsify transformers with SAEs and transcoders
☆732Updated this week
ydyjya / Awesome-LLM-Safety
View on GitHub
A curated list of safety-related papers, articles, and resources focused on Large Language Models (LLMs). This repository aims to provide…
☆1,889Jul 12, 2026Updated last week
atfortes / Awesome-LLM-Reasoning
View on GitHub
From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓
☆3,649Apr 20, 2026Updated 3 months ago
aypan17 / latentqa
View on GitHub
☆34Nov 16, 2025Updated 8 months ago
Glaciohound / LM-Steer
View on GitHub
Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)
☆149Jul 13, 2025Updated last year
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
EleutherAI / delphi
View on GitHub
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆266Updated this week
stanfordnlp / pyvene
View on GitHub
Stanford NLP Python library for understanding and improving PyTorch models via interventions
☆892Mar 6, 2026Updated 4 months ago
openai / sparse_autoencoder
View on GitHub
☆595Jul 19, 2024Updated 2 years ago
zjunlp / KnowledgeCircuits
View on GitHub
[NeurIPS 2024] Knowledge Circuits in Pretrained Transformers
☆172Nov 14, 2025Updated 8 months ago
hannamw / EAP-IG
View on GitHub
☆82May 23, 2026Updated last month
Furyton / awesome-language-model-analysis
View on GitHub
This paper list focuses on the theoretical and empirical analysis of language models, especially large language models (LLMs). The papers…
☆101Updated this week
hkust-nlp / Activation_Decoding
View on GitHub
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)
☆63Mar 30, 2024Updated 2 years ago