FarnoushRJ / RelPLinks
[NeurIPS 2025 MechInterp Workshop - Spotlight] Official implementation of the paper "RelP: Faithful and Efficient Circuit Discovery in Language Models via Relevance Patching"
β24Updated 3 months ago
Alternatives and similar repositories for RelP
Users that are interested in RelP are comparing it to the libraries listed below
Sorting:
- NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformersβ42Updated 11 months ago
- πͺPISCES - Precise In-Parameter Suppression for Concept EraSure in Large Language Modelsβ12Updated 8 months ago
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.β57Updated 3 months ago
- Materials for EACL2024 tutorial: Transformer-specific Interpretabilityβ63Updated last year
- β25Updated 9 months ago
- β83Updated 11 months ago
- Landing page for MIB: A Mechanistic Interpretability Benchmarkβ24Updated 5 months ago
- Measuring the Mixing of Contextual Information in the Transformerβ34Updated 2 years ago
- Implementation of the BatchTopK activation function for training sparse autoencoders (SAEs)β60Updated 6 months ago
- β32Updated 11 months ago
- β115Updated 11 months ago
- β51Updated 2 years ago
- β143Updated last month
- β23Updated 5 months ago
- π Overcomplete is a Vision-based SAE Toolboxβ119Updated 2 months ago
- Sparse probing paper full code.β66Updated 2 years ago
- Simple and scalable tools for data-driven pretraining data selection.β29Updated 8 months ago
- β132Updated 2 years ago
- β206Updated 3 months ago
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)β20Updated last year
- A library for efficient patching and automatic circuit discovery.β88Updated last month
- Layer-wise Relevance Propagation for Large Language Models and Vision Transformers [ICML 2024]β219Updated 6 months ago
- Unified access to Large Language Model modules using NNsightβ87Updated last week
- Attribution-based Parameter Decompositionβ33Updated 7 months ago
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"β45Updated last year
- Repository for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits, accepted at CVPR 2024 XAI4CV Worksβ¦β19Updated last year
- β58Updated last year
- A toolkit for quantitative evaluation of data attribution methods.β55Updated 6 months ago
- Code for "On Measuring Faithfulness of Natural Language Explanations"β21Updated last year
- Engine for collecting, uploading, and downloading model activationsβ25Updated 10 months ago