Trustworthy-ML-Lab / CB-LLMsLinks
[ICLR 25] A novel framework for building intrinsically interpretable LLMs with human-understandable concepts to ensure safety, reliability, transparency, and trustworthiness.
☆28Updated 4 months ago
Alternatives and similar repositories for CB-LLMs
Users that are interested in CB-LLMs are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024 Spotlight] EMR-Merging: Tuning-Free High-Performance Model Merging☆74Updated 10 months ago
- [ICLR '25] Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"☆96Updated last month
- Official repository of "Localizing Task Information for Improved Model Merging and Compression" [ICML 2024]☆51Updated 2 weeks ago
- ☆77Updated last year
- AdaMerging: Adaptive Model Merging for Multi-Task Learning. ICLR, 2024.☆98Updated last year
- Code for Reducing Hallucinations in Vision-Language Models via Latent Space Steering☆99Updated last year
- A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository agg…☆173Updated 2 months ago
- Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".☆108Updated 2 years ago
- ☆67Updated 5 months ago
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆46Updated last year
- LCA-on-the-line (ICML 2024 Oral)☆13Updated 10 months ago
- Less is More: High-value Data Selection for Visual Instruction Tuning☆17Updated 11 months ago
- ☆68Updated 10 months ago
- ☆60Updated 5 months ago
- A library of visualization tools for the interpretability and hallucination analysis of large vision-language models (LVLMs).☆42Updated 7 months ago
- ☆43Updated 2 years ago
- [NeurIPS 2025] Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models☆57Updated last month
- ☆44Updated 6 months ago
- A curated list of resources for activation engineering☆120Updated 3 months ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆62Updated last year
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)☆86Updated 3 months ago
- MokA: Multimodal Low-Rank Adaptation for MLLMs☆62Updated last week
- [NeurIPS25] RULE: Reinforcement UnLEarning Achieves Forge-retain Pareto Optimality☆18Updated 2 months ago
- ☆13Updated 9 months ago
- [NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"☆38Updated 5 months ago
- ☆62Updated last year
- FeatureAlignment = Alignment + Mechanistic Interpretability☆34Updated 10 months ago
- Collection of Reverse Engineering in Large Model☆36Updated last year
- ☆70Updated last year
- XL-VLMs: General Repository for eXplainable Large Vision Language Models☆45Updated 4 months ago