mlepori1 / NeuroSurgeonView external linksLinks
NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformers
β43Feb 12, 2025Updated last year
Alternatives and similar repositories for NeuroSurgeon
Users that are interested in NeuroSurgeon are comparing it to the libraries listed below
Sorting:
- πͺPISCES - Precise In-Parameter Suppression for Concept EraSure in Large Language Modelsβ12May 30, 2025Updated 8 months ago
- β51Oct 23, 2023Updated 2 years ago
- [NAACL 2022] GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformersβ21May 16, 2023Updated 2 years ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"β48Jan 17, 2024Updated 2 years ago
- Your finetuned model's back to its original safety standards faster than you can say "SafetyLock"!β11Oct 16, 2024Updated last year
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"β13Jul 18, 2024Updated last year
- β17Aug 30, 2025Updated 5 months ago
- π π Auto check for new apartments in Hamburg from various real estate providesβ16Jun 2, 2024Updated last year
- Temporal Neural Networksβ21Jan 14, 2026Updated last month
- β37May 28, 2023Updated 2 years ago
- β19Sep 16, 2025Updated 5 months ago
- CMD: a framework for Context-aware Model self-Detoxification (EMNLP2024 Long Paper)β17Feb 10, 2025Updated last year
- Tasks for describing differences between text distributions.β17Aug 9, 2024Updated last year
- [NeurIPS 2025 MechInterp Workshop - Spotlight] Official implementation of the paper "RelP: Faithful and Efficient Circuit Discovery in Laβ¦β27Nov 3, 2025Updated 3 months ago
- Find informative examples to efficiently (human)-evaluate NLG models.β18Feb 9, 2026Updated last week
- Fast Axiomatic Attribution for Neural Networks (NeurIPS*2021)β16May 12, 2023Updated 2 years ago
- Attribute statements generated by LLMs to preceding tokens using attention weights.β22Apr 22, 2025Updated 9 months ago
- Engine for collecting, uploading, and downloading model activationsβ26Apr 2, 2025Updated 10 months ago
- β17Apr 28, 2022Updated 3 years ago
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"β71Jun 19, 2024Updated last year
- Code for Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals.β18Apr 25, 2021Updated 4 years ago
- A library for mechanistic anomaly detectionβ22Jan 9, 2025Updated last year
- DecompX: Explaining Transformers Decisions by Propagating Token Decomposition [ACL 2023]β19Jul 3, 2025Updated 7 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methodsβ165Jun 25, 2025Updated 7 months ago
- Code for "Tracing Knowledge in Language Models Back to the Training Data"β39Dec 27, 2022Updated 3 years ago
- β31Updated this week
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)β20Jan 19, 2025Updated last year
- β83Feb 25, 2025Updated 11 months ago
- Code for paper "Search Methods for Sufficient, Socially-Aligned Feature Importance Explanations with In-Distribution Counterfactuals"β18Oct 17, 2022Updated 3 years ago
- Word sense disambiguation test sets for NMTβ20Dec 3, 2020Updated 5 years ago
- π Overcomplete is a Vision-based SAE Toolboxβ118Dec 4, 2025Updated 2 months ago
- β23Sep 24, 2024Updated last year
- β207Oct 14, 2025Updated 4 months ago
- Measuring if attention is explanation with ROARβ22Mar 3, 2023Updated 2 years ago
- The official repository for our paper "Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks". Weβ¦β46Oct 3, 2023Updated 2 years ago
- Maximal Mutual Information (MMI) Taggerβ25Jun 6, 2019Updated 6 years ago
- β24Jun 17, 2025Updated 8 months ago
- https://arxiv.org/abs/2209.15162β53Jan 24, 2023Updated 3 years ago
- Easy-to-use MIRAGE code for faithful answer attribution in RAG applications. Paper: https://aclanthology.org/2024.emnlp-main.347/β26Mar 10, 2025Updated 11 months ago