Aaquib111/edge-attribution-patching

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Aaquib111/edge-attribution-patching)

Aaquib111 / edge-attribution-patching

Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"

☆48

Alternatives and similar repositories for edge-attribution-patching

Users that are interested in edge-attribution-patching are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

hannamw / EAP-IG
View on GitHub
☆83May 23, 2026Updated 2 months ago
UFO-101 / auto-circuit
View on GitHub
A library for efficient patching and automatic circuit discovery.
☆99Dec 31, 2025Updated 6 months ago
ArthurConmy / Automatic-Circuit-Discovery
View on GitHub
☆293Oct 1, 2024Updated last year
princeton-nlp / Edge-Pruning
View on GitHub
[NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".
☆70Aug 15, 2025Updated 11 months ago
EleutherAI / elk-generalization
View on GitHub
Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…
☆33May 23, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
redwoodresearch / Easy-Transformer
View on GitHub
☆148Aug 4, 2024Updated last year
saprmarks / dictionary_learning
View on GitHub
☆427Aug 21, 2025Updated 11 months ago
jiahai-feng / binding-iclr
View on GitHub
☆19Mar 5, 2024Updated 2 years ago
FlyingPumba / InterpBench
View on GitHub
A benchmark for mechanistic discovery of circuits in Transformers
☆17Dec 15, 2024Updated last year
evandez / relations
View on GitHub
How do transformer LMs encode relations?
☆59Feb 24, 2024Updated 2 years ago
ndif-team / nnterp
View on GitHub
Unified access to Large Language Model modules using NNsight
☆116Jul 2, 2026Updated 3 weeks ago
callummcdougall / sae_vis
View on GitHub
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆267Feb 27, 2026Updated 4 months ago
steering-vectors / steering-vectors
View on GitHub
Steering vectors for transformer language models in Pytorch / Huggingface
☆157Feb 21, 2025Updated last year
science-of-finetuning / sparsity-artifacts-crosscoders
View on GitHub
Code for the "Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning" paper.
☆17Jul 6, 2026Updated 2 weeks ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Butanium / tiny-activation-dashboard
View on GitHub
A tiny easily hackable implementation of a feature dashboard.
☆17Oct 21, 2025Updated 9 months ago
Aaquib111 / Sparse-GPT-Finetuning
View on GitHub
Code for my ICLR 2024 TinyPapers paper "Prune and Tune: Improving Efficient Pruning Techniques for Massive Language Models"
☆16May 26, 2023Updated 3 years ago
Nix07 / finetuning
View on GitHub
This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…
☆32Oct 27, 2025Updated 8 months ago
AsaCooperStickland / situational-awareness-evals
View on GitHub
Measuring the situational awareness of language models
☆41Feb 12, 2024Updated 2 years ago
zjunlp / KnowledgeCircuits
View on GitHub
[NeurIPS 2024] Knowledge Circuits in Pretrained Transformers
☆172Nov 14, 2025Updated 8 months ago
MaheepChaudhary / SAE-Ravel
View on GitHub
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…
☆13Jan 26, 2025Updated last year
jam3scampbell / llama-lying
View on GitHub
Code for our paper "Localizing Lying in Llama"
☆15Apr 24, 2025Updated last year
ndif-team / ndif
View on GitHub
The NDIF server, which performs deep inference and serves nnsight requests remotely
☆50Updated this week
adamkarvonen / dictionary_learning_demo
View on GitHub
☆26Aug 23, 2025Updated 11 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
saprmarks / feature-circuits
View on GitHub
☆223Oct 14, 2025Updated 9 months ago
ApolloResearch / apd
View on GitHub
Attribution-based Parameter Decomposition
☆35Jun 11, 2025Updated last year
TransformerLensOrg / TransformerLens
View on GitHub
A library for mechanistic interpretability of GPT-style language models
☆3,710Updated this week
duykhuongnguyen / MAT-Steer
View on GitHub
☆21Aug 19, 2025Updated 11 months ago
adamkarvonen / SAE_BoardGameEval
View on GitHub
☆25Jan 28, 2025Updated last year
ndif-team / nnsight
View on GitHub
The nnsight package enables interpreting and manipulating the internals of deep learned models.
☆997Updated this week
IBM / sae-steering
View on GitHub
Code to enable layer-level steering in LLMs using sparse auto encoders
☆34Sep 18, 2025Updated 10 months ago
Helsinki-NLP / OPUS-MT-testsets
View on GitHub
benchmarks for evaluating MT models
☆11Jun 26, 2024Updated 2 years ago
tripos-education / maths-tripos-questions
View on GitHub
Archive of questions from the Cambridge Mathematics Tripos
☆10Jun 6, 2022Updated 4 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
TransformerLensOrg / CircuitsVis
View on GitHub
Mechanistic Interpretability Visualizations using React
☆358Apr 30, 2026Updated 2 months ago
chanind / linear-relational
View on GitHub
Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch
☆11Aug 7, 2024Updated last year
cardiffnlp / dialz
View on GitHub
The official repo for the Dialz Python library - a toolkit for steering vector research.
☆27Mar 26, 2026Updated 3 months ago
ApolloResearch / e2e_sae
View on GitHub
Sparse Autoencoder Training Library
☆58May 1, 2025Updated last year
UCSC-VLAA / AttnGCG-attack
View on GitHub
[TMLR 2025] Official implementation of AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
☆27Jun 17, 2025Updated last year
MikaStars39 / FeatureAlignment
View on GitHub
FeatureAlignment = Alignment + Mechanistic Interpretability
☆35Mar 8, 2025Updated last year
decoderesearch / automated-interpretability
View on GitHub
☆24Feb 13, 2026Updated 5 months ago