FarnoushRJ/RelP

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/FarnoushRJ/RelP)

FarnoushRJ / RelP

[NeurIPS 2025 MechInterp Workshop - Spotlight] Official implementation of the paper "RelP: Faithful and Efficient Circuit Discovery in Language Models via Relevance Patching"

☆29

Alternatives and similar repositories for RelP

Users that are interested in RelP are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

gumityolcu / dualxda-pip
View on GitHub
PyPI package for DualXDA for efficient data attribution and feature-level explanations of training data influence
☆22Mar 6, 2026Updated 4 months ago
jim-berend / semanticlens
View on GitHub
Mechanistic understanding and validation of large AI models with SemanticLens
☆54Dec 4, 2025Updated 7 months ago
TransluceAI / circuits
View on GitHub
ADAG: Transluce's MLP neuron-level circuit tracing library
☆33Apr 10, 2026Updated 3 months ago
lkopf / cosy
View on GitHub
[NeurIPS 2024] CoSy is an automatic evaluation framework for textual explanations of neurons.
☆20Jan 28, 2026Updated 5 months ago
maxdreyer / Reveal2Revise
View on GitHub
Reveal to Revise: An Explainable AI Life Cycle for Iterative Bias Correction of Deep Models. Paper presented at MICCAI 2023 conference.
☆20Jan 17, 2024Updated 2 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
seulkiyeom / LRP_Pruning_toy_example
View on GitHub
Pruning CNN using CNN with toy example
☆23Jun 21, 2021Updated 5 years ago
cadentj / caft
View on GitHub
☆25Mar 30, 2026Updated 3 months ago
chr5tphr / zennit
View on GitHub
Zennit is a high-level framework in Python using PyTorch for explaining/exploring neural networks using attribution methods like LRP.
☆247May 13, 2026Updated 2 months ago
FarnoushRJ / MambaLRP
View on GitHub
[NeurIPS 2024] Official implementation of the paper "MambaLRP: Explaining Selective State Space Sequence Models" 🐍
☆47Nov 6, 2024Updated last year
science-of-finetuning / diffing-toolkit
View on GitHub
A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.
☆78Updated this week
technion-cs-nlp / parametric-faithfulness
View on GitHub
☆23Aug 30, 2025Updated 10 months ago
rachtibat / LRP-eXplains-Transformers
View on GitHub
Layer-wise Relevance Propagation for Large Language Models and Vision Transformers [ICML 2024]
☆241Jul 11, 2025Updated last year
aleks-krasowski / PINNfluence
View on GitHub
☆17Jun 3, 2026Updated last month
maxdreyer / pcx
View on GitHub
Prototypical Concept-based Explanations, accepted at SAIAD workshop at CVPR 2024.
☆16Feb 20, 2026Updated 5 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
maxdreyer / PURE
View on GitHub
Repository for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits, accepted at CVPR 2024 XAI4CV Works…
☆20May 29, 2024Updated 2 years ago
Heidelberg-NLP / CC-SHAP
View on GitHub
Code for "On Measuring Faithfulness of Natural Language Explanations"
☆23Jul 14, 2026Updated last week
sobieskibj / kbdm
View on GitHub
☆15Nov 3, 2025Updated 8 months ago
wbopan / safety-residual-space
View on GitHub
Multi-dimensional analysis of orthogonal safety directions in LLM alignment
☆22Jun 12, 2026Updated last month
annahedstroem / MetaQuantus
View on GitHub
MetaQuantus is an XAI performance tool to identify reliable evaluation metrics
☆44Apr 17, 2024Updated 2 years ago
keing1 / reward-hack-generalization
View on GitHub
Datasets used in the paper "Reward hacking behavior can generalize across tasks"
☆15Aug 17, 2025Updated 11 months ago
Betswish / MIRAGE
View on GitHub
Easy-to-use MIRAGE code for faithful answer attribution in RAG applications. Paper: https://aclanthology.org/2024.emnlp-main.347/
☆25Mar 10, 2025Updated last year
yoavgur / PISCES
View on GitHub
🪝PISCES - Precise In-Parameter Suppression for Concept EraSure in Large Language Models
☆13Jun 28, 2026Updated 3 weeks ago
dilyabareeva / quanda
View on GitHub
A toolkit for quantitative evaluation of data attribution methods.
☆60May 11, 2026Updated 2 months ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
AmeenAli / XAI_Transformers
View on GitHub
Official Code Implementation of the paper : XAI for Transformers: Better Explanations through Conservative Propagation
☆67Feb 14, 2022Updated 4 years ago
Jiaxin-Wen / MisleadLM
View on GitHub
Official Code for our paper: "Language Models Learn to Mislead Humans via RLHF""
☆20Oct 11, 2024Updated last year
samyadeepbasu / LocoGen
View on GitHub
Localization of Knowledge in Text-to-Image Models
☆11Oct 8, 2024Updated last year
NLeSC / XAI
View on GitHub
Prototyping about eXplainable Artificial Inteligence (XAI)
☆25Jan 5, 2023Updated 3 years ago
anthropics / sycophancy-to-subterfuge-paper
View on GitHub
☆28Sep 5, 2024Updated last year
MadryLab / AT2
View on GitHub
Attribute statements generated by LLMs to preceding tokens using attention weights.
☆28Apr 22, 2025Updated last year
lukasgarbas / can-we-tune-together
View on GitHub
Combining encoder-based language models
☆11Nov 11, 2021Updated 4 years ago
virelay / corelay
View on GitHub
CoRelAy is a tool to compose small-scale (single-machine) analysis pipelines.
☆32Apr 30, 2026Updated 2 months ago
mim-uw / eXplainableMachineLearning-2024
View on GitHub
eXplainable Machine Learning 2023/24 at MIM UW
☆22Feb 3, 2024Updated 2 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
zouharvi / subset2evaluate
View on GitHub
Find informative examples to efficiently (human)-evaluate NLG models.
☆17Apr 22, 2026Updated 2 months ago
mt-upc / transformer-contributions-nmt
View on GitHub
☆18Oct 6, 2022Updated 3 years ago
Multi-Agent-Security-Initiative / thought_virus
View on GitHub
☆32May 29, 2026Updated last month
hannamw / MIB-circuit-track
View on GitHub
☆24Jun 30, 2025Updated last year
ckkissane / sae-transfer
View on GitHub
Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"
☆13Jul 18, 2024Updated 2 years ago
mini-pw / 2023L-ExploratoryDataAnalysis
View on GitHub
Introduction to exploratory data analysis course for Mathematics and data analysis studies in Spring 2022/2023
☆16Aug 8, 2023Updated 2 years ago
McGill-NLP / latentlens
View on GitHub
Code and data for the paper "LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs"
☆48Mar 31, 2026Updated 3 months ago