rachtibat / LRP-eXplains-TransformersLinks

Layer-wise Relevance Propagation for Large Language Models and Vision Transformers [ICML 2024]

☆203

Alternatives and similar repositories for LRP-eXplains-Transformers

Users that are interested in LRP-eXplains-Transformers are comparing it to the libraries listed below

Sorting:

rachtibat / zennit-crp
An eXplainable AI toolkit with Concept Relevance Propagation and Relevance Maximization
☆138Updated last year
HoagyC / sparse_coding
Using sparse coding to find distributed representations used by neural networks.
☆283Updated 2 years ago
lkopf / cosy
[NeurIPS 2024] CoSy is an automatic evaluation framework for textual explanations of neurons.
☆18Updated 5 months ago
ruizheliUOA / Awesome-Interpretability-in-Large-Language-Models
This repository collects all relevant resources about interpretability in LLMs
☆382Updated last year
AmeenAli / XAI_Transformers
Official Code Implementation of the paper : XAI for Transformers: Better Explanations through Conservative Propagation
☆67Updated 3 years ago
rushrukh / awesome-explainable-ai
A repository for summaries of recent explainable AI/Interpretable ML approaches
☆86Updated last year
chrisliu298 / awesome-representation-engineering
A resource repository for representation engineering in large language models
☆140Updated last year
chr5tphr / zennit
Zennit is a high-level framework in Python using PyTorch for explaining/exploring neural networks using attribution methods like LRP.
☆235Updated 3 months ago
adamkarvonen / SAEBench
☆136Updated this week
neuroexplicit-saar / Discover-then-Name
Code for the paper: Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery. ECCV 2024.
☆52Updated last year
KempnerInstitute / overcomplete
👋 Overcomplete is a Vision-based SAE Toolbox
☆101Updated 2 weeks ago
jim-berend / semanticlens
Mechanistic understanding and validation of large AI models with SemanticLens
☆46Updated 2 months ago
ai-safety-foundation / sparse_autoencoder
Sparse Autoencoder for Mechanistic Interpretability
☆284Updated last year
dilyabareeva / quanda
A toolkit for quantitative evaluation of data attribution methods.
☆54Updated 4 months ago
sciai-lab / Truth_is_Universal
☆28Updated last year
FarnoushRJ / RelP
[NeurIPS 2025 MechInterp Workshop - Spotlight] Official implementation of the paper "RelP: Faithful and Efficient Circuit Discovery in La…
☆17Updated 2 weeks ago
wesg52 / sparse-probing-paper
Sparse probing paper full code.
☆65Updated last year
jjcherian / conformal-safety
☆32Updated 11 months ago
interpretingdl / eacl2024_transformer_interpretability_tutorial
Materials for EACL2024 tutorial: Transformer-specific Interpretability
☆60Updated last year
annahedstroem / MetaQuantus
MetaQuantus is an XAI performance tool to identify reliable evaluation metrics
☆39Updated last year
maxdreyer / PURE
Repository for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits, accepted at CVPR 2024 XAI4CV Works…
☆19Updated last year
saprmarks / dictionary_learning
☆364Updated 3 months ago
i-gallegos / Fair-LLM-Benchmark
☆156Updated 2 years ago
ykwon0407 / DataInf
DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)
☆75Updated last year
KihoPark / linear_rep_geometry
☆110Updated 9 months ago
MadryLab / trak
A fast, effective data attribution method for neural networks in PyTorch
☆220Updated last year
Dakingrai / awesome-mechanistic-interpretability-lm-papers
☆214Updated last year
kaifishr / PyTorchRelevancePropagation
A basic implementation of Layer-wise Relevance Propagation (LRP) in PyTorch.
☆98Updated 3 years ago
yewsiang / ConceptBottleneck
Concept Bottleneck Models, ICML 2020
☆227Updated 2 years ago
AI4LIFE-GROUP / OpenXAI
OpenXAI : Towards a Transparent Evaluation of Model Explanations
☆249Updated last year