atticusg / CausalAbstractionLinks

☆23

Alternatives and similar repositories for CausalAbstraction

Users that are interested in CausalAbstraction are comparing it to the libraries listed below

Sorting:

explanare / ravel
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
☆56Updated last year
wesg52 / sparse-probing-paper
Sparse probing paper full code.
☆62Updated last year
KihoPark / linear_rep_geometry
☆106Updated 8 months ago
redwoodresearch / Easy-Transformer
☆126Updated last year
ARBORproject / arborproject.github.io
☆81Updated 7 months ago
MaheepChaudhary / SAE-Ravel
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…
☆12Updated 8 months ago
jacobdunefsky / transcoder_circuits
☆179Updated 11 months ago
saprmarks / feature-circuits
☆190Updated this week
UFO-101 / auto-circuit
A library for efficient patching and automatic circuit discovery.
☆77Updated 2 months ago
Butanium / nnterp
Unified access to Large Language Model modules using NNsight
☆49Updated last week
interpretingdl / eacl2024_transformer_interpretability_tutorial
Materials for EACL2024 tutorial: Transformer-specific Interpretability
☆60Updated last year
adamkarvonen / SAEBench
☆131Updated last week
koayon / atp_star
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
☆20Updated 9 months ago
jkallini / mission-impossible-language-models
Code repository for the paper "Mission: Impossible Language Models."
☆54Updated 3 weeks ago
saprmarks / geometry-of-truth
☆92Updated last year
hannamw / EAP-IG
☆53Updated 2 months ago
mlepori1 / NeuroSurgeon
NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformers
☆41Updated 8 months ago
montemac / activation_additions
Algebraic value editing in pretrained language models
☆66Updated last year
Nix07 / finetuning
This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…
☆28Updated last year
evandez / relations
How do transformer LMs encode relations?
☆55Updated last year
Aaquib111 / edge-attribution-patching
Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"
☆42Updated last year
IBM / activation-steering
[ICLR 2025] General-purpose activation steering library
☆111Updated last month
ArthurConmy / Automatic-Circuit-Discovery
☆244Updated last year
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆218Updated this week
aaronmueller / MIB
Landing page for MIB: A Mechanistic Interpretability Benchmark
☆20Updated 2 months ago
curt-tigges / probity
☆19Updated 6 months ago
ckkissane / sae-transfer
Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"
☆12Updated last year
neelnanda-io / 1L-Sparse-Autoencoder
☆128Updated last year
science-of-finetuning / crosscoder_learning
Modified to support crosscoder training.
☆23Updated last week
adamkarvonen / dictionary_learning_demo
☆18Updated last month