A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository aggregates surveys, blog posts, and research papers that explore how LMMs represent, transform, and align multimodal information internally.
☆198Mar 4, 2026Updated 2 months ago
Alternatives and similar repositories for Awesome-LMMs-Mechanistic-Interpretability
Users that are interested in Awesome-LMMs-Mechanistic-Interpretability are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- awesome SAE papers☆77May 24, 2025Updated 11 months ago
- ☆241Nov 22, 2024Updated last year
- Official implementation of Visco-Attack (EMNLP 2025 Main). An open-source one-click reproduction script is also provided.☆30Apr 11, 2026Updated 3 weeks ago
- awesome papers in LLM interpretability☆618Aug 20, 2025Updated 8 months ago
- ☆35Jun 13, 2025Updated 10 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- A versatile toolkit for applying Logit Lens to modern large language models (LLMs). Currently supports Llama-3.1-8B and Qwen-2.5-7B, enab…☆171Aug 14, 2025Updated 8 months ago
- [ACL 2025] Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety☆60Jul 21, 2025Updated 9 months ago
- ☆22Sep 16, 2025Updated 7 months ago
- ScalingOpt - Optimization Community☆94May 2, 2026Updated last week
- [ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal…☆81Jun 6, 2024Updated last year
- [AAAI 2025 oral] Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit☆19Apr 19, 2025Updated last year
- [ICLR '25] Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"☆102Nov 30, 2025Updated 5 months ago
- SFT+RL boosts multimodal reasoning☆48Jun 27, 2025Updated 10 months ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆62Mar 30, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Ranking-Consistent Language-Image Pretraining☆13Oct 24, 2025Updated 6 months ago
- The official repository for paper "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance"☆46Apr 21, 2024Updated 2 years ago
- code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models☆52Nov 17, 2024Updated last year
- This repository collects all relevant resources about interpretability in LLMs☆396Nov 1, 2024Updated last year
- ☆37Nov 14, 2025Updated 5 months ago
- ☆16Apr 14, 2021Updated 5 years ago
- Latest Advances on Modality Priors in Multimodal Large Language Models☆31Dec 10, 2025Updated 4 months ago
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆303Jan 22, 2026Updated 3 months ago
- ☆13Apr 10, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆45Jun 19, 2025Updated 10 months ago
- Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons☆13Feb 13, 2023Updated 3 years ago
- Localization of Knowledge in Text-to-Image Models☆12Oct 8, 2024Updated last year
- [ICML2025] Official code for "Reinforced Lifelong Editing for Language Models"☆21Feb 23, 2025Updated last year
- [CVPR 2026] Thinking with Programming Vision: Towards a Unified View for Thinking with Images☆69Jan 23, 2026Updated 3 months ago
- Official PyTorch Implementation for the "What if...?: Thinking Counterfactual Keywords Helps to Mitigate Hallucination in Large Multi-mod…☆20Sep 26, 2024Updated last year
- official implementation of "Interpreting CLIP's Image Representation via Text-Based Decomposition"☆234Jun 1, 2025Updated 11 months ago
- A curated reading list of research in Sparse Autoencoders, Feature Extraction and related topics in Mechanistic Interpretability☆31Jan 30, 2025Updated last year
- QRHead: Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking☆38Jan 20, 2026Updated 3 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- The code of the paper "DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects"☆19May 2, 2025Updated last year
- [NeurIPS 2024] Large Language Model Unlearning via Embedding-Corrupted Prompts☆40Sep 26, 2024Updated last year
- (NeurIPS 2025) Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation☆67Oct 14, 2025Updated 6 months ago
- Code for ICLR 2025 Failures to Find Transferable Image Jailbreaks Between Vision-Language Models☆36Jun 1, 2025Updated 11 months ago
- Accepted by IJCAI-24 Survey Track☆231Aug 25, 2024Updated last year
- Documentation for EEE Cluster 02☆46May 1, 2026Updated last week
- [ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆198Sep 26, 2025Updated 7 months ago