A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository aggregates surveys, blog posts, and research papers that explore how LMMs represent, transform, and align multimodal information internally.
☆197Mar 4, 2026Updated last month
Alternatives and similar repositories for Awesome-LMMs-Mechanistic-Interpretability
Users that are interested in Awesome-LMMs-Mechanistic-Interpretability are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- awesome SAE papers☆75May 24, 2025Updated 10 months ago
- ☆240Nov 22, 2024Updated last year
- Official implementation of Visco-Attack (EMNLP 2025 Main). An open-source one-click reproduction script is also provided.☆30Apr 11, 2026Updated last week
- awesome papers in LLM interpretability☆616Aug 20, 2025Updated 7 months ago
- ☆35Jun 13, 2025Updated 10 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A versatile toolkit for applying Logit Lens to modern large language models (LLMs). Currently supports Llama-3.1-8B and Qwen-2.5-7B, enab…☆166Aug 14, 2025Updated 8 months ago
- [ACL 2025] Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety☆59Jul 21, 2025Updated 8 months ago
- ☆22Sep 16, 2025Updated 7 months ago
- ScalingOpt - Optimization Community☆85Apr 2, 2026Updated 2 weeks ago
- ☆83Nov 5, 2024Updated last year
- [ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal…☆81Jun 6, 2024Updated last year
- [ICLR '25] Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"☆102Nov 30, 2025Updated 4 months ago
- SFT+RL boosts multimodal reasoning☆48Jun 27, 2025Updated 9 months ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆62Mar 30, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Ranking-Consistent Language-Image Pretraining☆12Oct 24, 2025Updated 5 months ago
- The official repository for paper "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance"☆45Apr 21, 2024Updated last year
- code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models☆52Nov 17, 2024Updated last year
- ☆37Nov 14, 2025Updated 5 months ago
- High-performance key-value store☆12Dec 31, 2018Updated 7 years ago
- ☆16Apr 14, 2021Updated 5 years ago
- Latest Advances on Modality Priors in Multimodal Large Language Models☆31Dec 10, 2025Updated 4 months ago
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆301Jan 22, 2026Updated 2 months ago
- ☆13Apr 10, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A tiny paper rating web☆40Mar 19, 2025Updated last year
- ☆45Jun 19, 2025Updated 10 months ago
- Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons☆13Feb 13, 2023Updated 3 years ago
- Localization of Knowledge in Text-to-Image Models☆12Oct 8, 2024Updated last year
- [ICML2025] Official code for "Reinforced Lifelong Editing for Language Models"☆21Feb 23, 2025Updated last year
- [CVPR 2026] Thinking with Programming Vision: Towards a Unified View for Thinking with Images☆68Jan 23, 2026Updated 2 months ago
- Official PyTorch Implementation for the "What if...?: Thinking Counterfactual Keywords Helps to Mitigate Hallucination in Large Multi-mod…☆20Sep 26, 2024Updated last year
- official implementation of "Interpreting CLIP's Image Representation via Text-Based Decomposition"☆233Jun 1, 2025Updated 10 months ago
- A curated reading list of research in Sparse Autoencoders, Feature Extraction and related topics in Mechanistic Interpretability☆30Jan 30, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- The code of the paper "DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects"☆20May 2, 2025Updated 11 months ago
- [NeurIPS 2024] Large Language Model Unlearning via Embedding-Corrupted Prompts☆39Sep 26, 2024Updated last year
- Code for ICLR 2025 Failures to Find Transferable Image Jailbreaks Between Vision-Language Models☆36Jun 1, 2025Updated 10 months ago
- [ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆193Sep 26, 2025Updated 6 months ago
- Documentation for EEE Cluster 02☆44Mar 5, 2026Updated last month
- The Github repo for our survey paper: "Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large…☆114Updated this week
- 📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).☆1,003Sep 27, 2025Updated 6 months ago