FarnoushRJ / RelPLinks
[NeurIPS 2025 MechInterp Workshop - Spotlight] Official implementation of the paper "RelP: Faithful and Efficient Circuit Discovery in Language Models via Relevance Patching"
β25Updated 3 months ago
Alternatives and similar repositories for RelP
Users that are interested in RelP are comparing it to the libraries listed below
Sorting:
- NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformersβ42Updated 11 months ago
- πͺPISCES - Precise In-Parameter Suppression for Concept EraSure in Large Language Modelsβ12Updated 8 months ago
- Materials for EACL2024 tutorial: Transformer-specific Interpretabilityβ63Updated last year
- β116Updated 11 months ago
- β32Updated 11 months ago
- Measuring the Mixing of Contextual Information in the Transformerβ34Updated 2 years ago
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.β57Updated 3 months ago
- β25Updated 9 months ago
- Landing page for MIB: A Mechanistic Interpretability Benchmarkβ24Updated 5 months ago
- A toolkit for quantitative evaluation of data attribution methods.β55Updated 6 months ago
- Layer-wise Relevance Propagation for Large Language Models and Vision Transformers [ICML 2024]β219Updated 6 months ago
- Mechanistic understanding and validation of large AI models with SemanticLensβ50Updated 2 months ago
- Simple and scalable tools for data-driven pretraining data selection.β29Updated 8 months ago
- β83Updated 11 months ago
- β23Updated 5 months ago
- β143Updated last month
- π Overcomplete is a Vision-based SAE Toolboxβ119Updated 2 months ago
- β58Updated last year
- β63Updated 4 years ago
- Implementation of the BatchTopK activation function for training sparse autoencoders (SAEs)β60Updated 6 months ago
- [NeurIPS 2024] Official implementation of the paper "MambaLRP: Explaining Selective State Space Sequence Models" πβ45Updated last year
- Repository for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits, accepted at CVPR 2024 XAI4CV Worksβ¦β19Updated last year
- Influence Functions with (Eigenvalue-corrected) Kronecker-Factored Approximate Curvatureβ178Updated 7 months ago
- Unified access to Large Language Model modules using NNsightβ87Updated last week
- Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the pβ¦β12Updated last year
- DecompX: Explaining Transformers Decisions by Propagating Token Decomposition [ACL 2023]β19Updated 7 months ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models β¦β241Updated 2 weeks ago
- β132Updated 2 years ago
- Code for "Tracing Knowledge in Language Models Back to the Training Data"β39Updated 3 years ago
- β389Updated 5 months ago