A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository aggregates surveys, blog posts, and research papers that explore how LMMs represent, transform, and align multimodal information internally.
☆190Mar 4, 2026Updated this week
Alternatives and similar repositories for Awesome-LMMs-Mechanistic-Interpretability
Users that are interested in Awesome-LMMs-Mechanistic-Interpretability are comparing it to the libraries listed below
Sorting:
- awesome SAE papers☆74May 24, 2025Updated 9 months ago
- ☆232Nov 22, 2024Updated last year
- Official implementation of Visco-Attack (EMNLP 2025 Main). We will progressively release the code and one-click reproduction scripts.☆30Aug 22, 2025Updated 6 months ago
- awesome papers in LLM interpretability☆609Aug 20, 2025Updated 6 months ago
- ☆79Nov 5, 2024Updated last year
- [ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal…☆80Jun 6, 2024Updated last year
- ☆13Apr 10, 2025Updated 10 months ago
- Official implementation of "Interpreting and Controlling Vision Foundation Models via Text Explanations"☆14May 29, 2024Updated last year
- SFT+RL boosts multimodal reasoning☆46Jun 27, 2025Updated 8 months ago
- [ACL 2025] Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety☆54Jul 21, 2025Updated 7 months ago
- ScalingOpt - Optimization Community☆80Updated this week
- Latest Advances on Modality Priors in Multimodal Large Language Models☆30Dec 10, 2025Updated 3 months ago
- ☆37Nov 14, 2025Updated 3 months ago
- ☆22Sep 16, 2025Updated 5 months ago
- High-performance key-value store☆12Dec 31, 2018Updated 7 years ago
- A very hacky set of functions for getting plotly to do what I want when doing mech interp research, designed to be compatible with PyTorc…☆13Jun 16, 2023Updated 2 years ago
- ☆36Jun 13, 2025Updated 8 months ago
- [AAAI 2025 oral] Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit☆19Apr 19, 2025Updated 10 months ago
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆294Jan 22, 2026Updated last month
- [ICLR '25] Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"☆97Nov 30, 2025Updated 3 months ago
- ☆74Oct 1, 2025Updated 5 months ago
- A Benchmark Study on Machine Learning Methods for Fake News Detection☆16Jun 8, 2021Updated 4 years ago
- The Github repo for our survey paper: "Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large…☆92Jan 30, 2026Updated last month
- Collection of Reverse Engineering in Large Model☆37Jan 8, 2025Updated last year
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"☆47May 31, 2024Updated last year
- [ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆184Sep 26, 2025Updated 5 months ago
- ☆44Jun 19, 2025Updated 8 months ago
- [ICML2025] Official code for "Reinforced Lifelong Editing for Language Models"☆21Feb 23, 2025Updated last year
- Paper list for the paper "Authorship Attribution in the Era of Large Language Models: Problems, Methodologies, and Challenges (SIGKDD Exp…☆18Dec 23, 2024Updated last year
- QRHead: Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking☆36Jan 20, 2026Updated last month
- The official repository for paper "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance"☆44Apr 21, 2024Updated last year
- 📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).☆986Sep 27, 2025Updated 5 months ago
- ☆17Apr 14, 2021Updated 4 years ago
- A tiny paper rating web☆40Mar 19, 2025Updated 11 months ago
- official implementation of "Interpreting CLIP's Image Representation via Text-Based Decomposition"☆234Jun 1, 2025Updated 9 months ago
- [ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.☆85Jan 19, 2025Updated last year
- code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models☆51Nov 17, 2024Updated last year
- [FCS'24] LVLM Safety paper☆19Jan 4, 2025Updated last year
- ☆73Jul 24, 2025Updated 7 months ago