mlepori1 / NeuroSurgeon
NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformers
☆40Updated 2 months ago
Alternatives and similar repositories for NeuroSurgeon:
Users that are interested in NeuroSurgeon are comparing it to the libraries listed below
- A library for efficient patching and automatic circuit discovery.☆62Updated 2 months ago
- ☆91Updated 2 months ago
- Sparse probing paper full code.☆56Updated last year
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"☆30Updated 10 months ago
- ☆26Updated this week
- ☆217Updated 6 months ago
- ☆66Updated 3 years ago
- ☆29Updated 9 months ago
- ☆61Updated last year
- This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…☆25Updated last year
- ☆41Updated last year
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆43Updated 6 months ago
- Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…☆11Updated 2 months ago
- Sparse Autoencoder Training Library☆47Updated 5 months ago
- ☆36Updated 2 years ago
- ☆83Updated this week
- The accompanying code for "Transformer Feed-Forward Layers Are Key-Value Memories". Mor Geva, Roei Schuster, Jonathan Berant, and Omer Le…☆90Updated 3 years ago
- ☆121Updated last year
- ☆157Updated last week
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆71Updated last month
- Steering Llama 2 with Contrastive Activation Addition☆137Updated 10 months ago
- Efficient empirical NTKs in PyTorch☆18Updated 2 years ago
- Mechanistic Interpretability for Transformer Models☆50Updated 2 years ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆26Updated 10 months ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆166Updated this week
- The official repository for our paper "Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks". We…☆46Updated last year
- Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".☆99Updated last year
- ☆19Updated last month
- Experiments with representation engineering☆11Updated last year
- ☆30Updated 3 months ago