FarnoushRJ / RelPLinks
[NeurIPS 2025 MechInterp Workshop - Spotlight] Official implementation of the paper "RelP: Faithful and Efficient Circuit Discovery in Language Models via Relevance Patching"
β23Updated 2 months ago
Alternatives and similar repositories for RelP
Users that are interested in RelP are comparing it to the libraries listed below
Sorting:
- πͺPISCES - Precise In-Parameter Suppression for Concept EraSure in Large Language Modelsβ12Updated 7 months ago
- Materials for EACL2024 tutorial: Transformer-specific Interpretabilityβ61Updated last year
- β16Updated 4 months ago
- NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformersβ42Updated 10 months ago
- β32Updated 10 months ago
- Measuring the Mixing of Contextual Information in the Transformerβ34Updated 2 years ago
- Sparse probing paper full code.β66Updated 2 years ago
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.β57Updated 2 months ago
- Landing page for MIB: A Mechanistic Interpretability Benchmarkβ23Updated 4 months ago
- β112Updated 10 months ago
- β83Updated 10 months ago
- Simple and scalable tools for data-driven pretraining data selection.β29Updated 6 months ago
- β138Updated last week
- β51Updated 2 years ago
- β24Updated 8 months ago
- Layer-wise Relevance Propagation for Large Language Models and Vision Transformers [ICML 2024]β213Updated 5 months ago
- [ICLR 2025] General-purpose activation steering libraryβ133Updated 3 months ago
- Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the pβ¦β12Updated 11 months ago
- Influence Functions with (Eigenvalue-corrected) Kronecker-Factored Approximate Curvatureβ175Updated 6 months ago
- Implementation of the BatchTopK activation function for training sparse autoencoders (SAEs)β57Updated 5 months ago
- β27Updated 3 weeks ago
- π Overcomplete is a Vision-based SAE Toolboxβ112Updated last month
- β68Updated 5 months ago
- β23Updated 4 months ago
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"β44Updated last year
- "Understanding Dataset Difficulty with V-Usable Information" (ICML 2022, outstanding paper)β89Updated 2 years ago
- Unified access to Large Language Model modules using NNsightβ71Updated last week
- DecompX: Explaining Transformers Decisions by Propagating Token Decomposition [ACL 2023]β19Updated 6 months ago
- This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entityβ¦β28Updated 2 months ago
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)β79Updated last year