itsqyh / Awesome-LMMs-Mechanistic-Interpretability
☆34Updated 2 months ago
Alternatives and similar repositories for Awesome-LMMs-Mechanistic-Interpretability:
Users that are interested in Awesome-LMMs-Mechanistic-Interpretability are comparing it to the libraries listed below
- A versatile toolkit for applying Logit Lens to modern large language models (LLMs). Currently supports Llama-3.1-8B and Qwen-2.5-7B, enab…☆75Updated 2 months ago
- FeatureAlignment = Alignment + Mechanistic Interpretability☆28Updated 2 months ago
- Latest Advances on Modality Priors in Multimodal Large Language Models☆16Updated last week
- awesome SAE papers☆27Updated 2 months ago
- A curated list of resources for activation engineering☆67Updated last month
- code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models☆31Updated 5 months ago
- 📜 Paper list on decoding methods for LLMs and LVLMs☆42Updated last week
- The reinforcement learning codes for dataset SPA-VL☆32Updated 10 months ago
- [ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"☆46Updated 4 months ago
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆37Updated 10 months ago
- ☆47Updated 5 months ago
- ☆94Updated 3 weeks ago
- A Survey on the Honesty of Large Language Models☆57Updated 5 months ago
- ☆29Updated 2 months ago
- Official code for SEAL: Steerable Reasoning Calibration of Large Language Models for Free☆20Updated last month
- This repo contains the code for the paper "Understanding and Mitigating Hallucinations in Large Vision-Language Models via Modular Attrib…☆16Updated last month
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆54Updated 5 months ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆57Updated last year
- Implementation code for ACL2024:Advancing Parameter Efficiency in Fine-tuning via Representation Editing☆14Updated last year
- This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.☆78Updated last week
- Code for Reducing Hallucinations in Vision-Language Models via Latent Space Steering☆52Updated 5 months ago
- [ICLR 2025] Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality☆24Updated 3 weeks ago
- ☆26Updated 6 months ago
- Code for Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities (NeurIPS'24)☆21Updated 4 months ago
- Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!☆50Updated last month
- Papers about Hallucination in Multi-Modal Large Language Models (MLLMs)☆89Updated 5 months ago
- ☆59Updated 3 weeks ago
- ☆10Updated 2 months ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆72Updated 2 months ago
- [ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"☆86Updated 5 months ago