FarnoushRJ / RelPLinks
[NeurIPS 2025 MechInterp Workshop - Spotlight] Official implementation of the paper "RelP: Faithful and Efficient Circuit Discovery in Language Models via Relevance Patching"
β21Updated last month
Alternatives and similar repositories for RelP
Users that are interested in RelP are comparing it to the libraries listed below
Sorting:
- πͺPISCES - Precise In-Parameter Suppression for Concept EraSure in Large Language Modelsβ11Updated 6 months ago
- Materials for EACL2024 tutorial: Transformer-specific Interpretabilityβ61Updated last year
- Landing page for MIB: A Mechanistic Interpretability Benchmarkβ21Updated 3 months ago
- NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformersβ42Updated 10 months ago
- Measuring the Mixing of Contextual Information in the Transformerβ33Updated 2 years ago
- β111Updated 10 months ago
- β30Updated 10 months ago
- Sparse probing paper full code.β65Updated last year
- β136Updated 3 weeks ago
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.β57Updated last month
- Simple and scalable tools for data-driven pretraining data selection.β29Updated 6 months ago
- β83Updated 9 months ago
- Layer-wise Relevance Propagation for Large Language Models and Vision Transformers [ICML 2024]β211Updated 5 months ago
- β65Updated 4 months ago
- Influence Functions with (Eigenvalue-corrected) Kronecker-Factored Approximate Curvatureβ171Updated 5 months ago
- β132Updated 2 years ago
- DecompX: Explaining Transformers Decisions by Propagating Token Decomposition [ACL 2023]β18Updated 5 months ago
- Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the pβ¦β12Updated 10 months ago
- [ICLR 2025] General-purpose activation steering libraryβ127Updated 2 months ago
- β22Updated 3 months ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models β¦β231Updated last week
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"β43Updated last year
- Mechanistic understanding and validation of large AI models with SemanticLensβ47Updated last week
- π Overcomplete is a Vision-based SAE Toolboxβ106Updated last week
- This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entityβ¦β28Updated last month
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)β20Updated 10 months ago
- Sparse Autoencoder Training Libraryβ55Updated 7 months ago
- β51Updated 2 years ago
- Attribute statements generated by LLMs to preceding tokens using attention weights.β19Updated 7 months ago
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)β78Updated last year