rachtibat / LRP-eXplains-TransformersLinks
Layer-wise Relevance Propagation for Large Language Models and Vision Transformers [ICML 2024]
☆213Updated 5 months ago
Alternatives and similar repositories for LRP-eXplains-Transformers
Users that are interested in LRP-eXplains-Transformers are comparing it to the libraries listed below
Sorting:
- An eXplainable AI toolkit with Concept Relevance Propagation and Relevance Maximization☆140Updated last year
- Official Code Implementation of the paper : XAI for Transformers: Better Explanations through Conservative Propagation☆67Updated 3 years ago
- Mechanistic understanding and validation of large AI models with SemanticLens☆48Updated last month
- ☆138Updated last week
- Code for the paper: Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery. ECCV 2024.☆53Updated last year
- Using sparse coding to find distributed representations used by neural networks.☆289Updated 2 years ago
- ☆32Updated last year
- Zennit is a high-level framework in Python using PyTorch for explaining/exploring neural networks using attribution methods like LRP.☆239Updated 5 months ago
- 👋 Overcomplete is a Vision-based SAE Toolbox☆112Updated last month
- A toolkit for quantitative evaluation of data attribution methods.☆54Updated 5 months ago
- [NeurIPS 2024] CoSy is an automatic evaluation framework for textual explanations of neurons.☆19Updated 6 months ago
- This repository collects all relevant resources about interpretability in LLMs☆389Updated last year
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆79Updated last year
- A repository for summaries of recent explainable AI/Interpretable ML approaches☆88Updated last year
- A fast, effective data attribution method for neural networks in PyTorch☆224Updated last year
- Code for the paper "Post-hoc Concept Bottleneck Models". Spotlight @ ICLR 2023☆89Updated last year
- Sparse Autoencoder for Mechanistic Interpretability☆285Updated last year
- 🪄 Interpreto is an interpretability toolbox for LLMs☆95Updated 2 weeks ago
- Materials for EACL2024 tutorial: Transformer-specific Interpretability☆61Updated last year
- A resource repository for representation engineering in large language models☆145Updated last year
- MetaQuantus is an XAI performance tool to identify reliable evaluation metrics☆40Updated last year
- ☆380Updated 4 months ago
- [NeurIPS 2025 MechInterp Workshop - Spotlight] Official implementation of the paper "RelP: Faithful and Efficient Circuit Discovery in La…☆23Updated 2 months ago
- ☆200Updated 2 months ago
- Sparse probing paper full code.☆66Updated 2 years ago
- ViT Prisma is a mechanistic interpretability library for Vision and Video Transformers (ViTs).☆331Updated 5 months ago
- [ICLR 2025] General-purpose activation steering library☆133Updated 3 months ago
- Repository for our NeurIPS 2022 paper "Concept Embedding Models", our NeurIPS 2023 paper "Learning to Receive Help", and our ICML 2025 pa…☆72Updated 2 months ago
- ☆23Updated 4 months ago
- Conformal Language Modeling☆32Updated 2 years ago