NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformers
β43Feb 12, 2025Updated last year
Alternatives and similar repositories for NeuroSurgeon
Users that are interested in NeuroSurgeon are comparing it to the libraries listed below
Sorting:
- πͺPISCES - Precise In-Parameter Suppression for Concept EraSure in Large Language Modelsβ12May 30, 2025Updated 9 months ago
- β52Oct 23, 2023Updated 2 years ago
- [NAACL 2022] GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformersβ21May 16, 2023Updated 2 years ago
- Your finetuned model's back to its original safety standards faster than you can say "SafetyLock"!β11Oct 16, 2024Updated last year
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"β13Jul 18, 2024Updated last year
- π π Auto check for new apartments in Hamburg from various real estate providesβ16Jun 2, 2024Updated last year
- β17Aug 30, 2025Updated 6 months ago
- CMD: a framework for Context-aware Model self-Detoxification (EMNLP2024 Long Paper)β17Feb 10, 2025Updated last year
- β19Sep 16, 2025Updated 5 months ago
- Tasks for describing differences between text distributions.β17Aug 9, 2024Updated last year
- Find informative examples to efficiently (human)-evaluate NLG models.β18Feb 27, 2026Updated last week
- [NeurIPS 2025 MechInterp Workshop - Spotlight] Official implementation of the paper "RelP: Faithful and Efficient Circuit Discovery in Laβ¦β27Nov 3, 2025Updated 4 months ago
- β17Apr 28, 2022Updated 3 years ago
- Fast Axiomatic Attribution for Neural Networks (NeurIPS*2021)β16Feb 24, 2026Updated last week
- Engine for collecting, uploading, and downloading model activationsβ26Apr 2, 2025Updated 11 months ago
- β23Jun 30, 2025Updated 8 months ago
- Attribute statements generated by LLMs to preceding tokens using attention weights.β22Apr 22, 2025Updated 10 months ago
- Temporal Neural Networksβ28Mar 2, 2026Updated last week
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"β71Jun 19, 2024Updated last year
- DecompX: Explaining Transformers Decisions by Propagating Token Decomposition [ACL 2023]β19Jul 3, 2025Updated 8 months ago
- β18Oct 6, 2022Updated 3 years ago
- A library for mechanistic anomaly detectionβ22Jan 9, 2025Updated last year
- A library for efficient patching and automatic circuit discovery.β91Dec 31, 2025Updated 2 months ago
- Code for Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals.β18Apr 25, 2021Updated 4 years ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methodsβ171Updated this week
- Code for "Tracing Knowledge in Language Models Back to the Training Data"β39Dec 27, 2022Updated 3 years ago
- β32Feb 15, 2026Updated 3 weeks ago
- Saliency Cards are transparency documentation for saliency methods. Learn about new saliency methods or document your own!β19Jun 9, 2023Updated 2 years ago
- β84Feb 25, 2025Updated last year
- Code for paper "Search Methods for Sufficient, Socially-Aligned Feature Importance Explanations with In-Distribution Counterfactuals"β18Oct 17, 2022Updated 3 years ago
- β24Sep 24, 2024Updated last year
- Word sense disambiguation test sets for NMTβ20Dec 3, 2020Updated 5 years ago
- π Overcomplete is a Vision-based SAE Toolboxβ126Dec 4, 2025Updated 3 months ago
- β209Oct 14, 2025Updated 4 months ago
- Measuring if attention is explanation with ROARβ22Mar 3, 2023Updated 3 years ago
- β24Jun 17, 2025Updated 8 months ago
- Maximal Mutual Information (MMI) Taggerβ25Jun 6, 2019Updated 6 years ago
- Easy-to-use MIRAGE code for faithful answer attribution in RAG applications. Paper: https://aclanthology.org/2024.emnlp-main.347/β26Mar 10, 2025Updated 11 months ago
- Interpreting Language Models with Contrastive Explanations (EMNLP 2022 Best Paper Honorable Mention)β62May 12, 2022Updated 3 years ago