evan-lloyd / graphpatchLinks

graphpatch is a library for activation patching on PyTorch neural network models.

☆16

Alternatives and similar repositories for graphpatch

Users that are interested in graphpatch are comparing it to the libraries listed below

Sorting:

ArthurConmy / MishformerLens
MishformerLens intends to be a drop-in replacement for TransformerLens that AST patches HuggingFace Transformers rather than implementing…
☆10Updated 8 months ago
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆52Updated last month
hannamw / MIB-circuit-track
☆14Updated last month
hannamw / EAP-IG
☆37Updated last month
callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆202Updated 6 months ago
neelnanda-io / 1L-Sparse-Autoencoder
☆121Updated last year
adamkarvonen / SAEBench
☆101Updated 3 weeks ago
callummcdougall / sae_visualizer
☆28Updated last year
tilde-research / activault
Engine for collecting, uploading, and downloading model activations
☆18Updated 2 months ago
jmerullo / lm_vector_arithmetic
☆35Updated 2 years ago
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆185Updated last week
KihoPark / linear_rep_geometry
☆95Updated 4 months ago
TomFrederik / unseal
Mechanistic Interpretability for Transformer Models
☆51Updated 3 years ago
KihoPark / LLM_Categorical_Hierarchical_Representations
☆99Updated 4 months ago
EleutherAI / concept-erasure
Erasing concepts from neural representations with provable guarantees
☆228Updated 5 months ago
ARBORproject / arborproject.github.io
☆77Updated 4 months ago
UFO-101 / auto-circuit
A library for efficient patching and automatic circuit discovery.
☆67Updated 2 months ago
explanare / ravel
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
☆47Updated 8 months ago
taufeeque9 / codebook-features
Sparse and discrete interpretability tool for neural networks
☆63Updated last year
jbloomAus / SAEDashboard
☆57Updated last week
bartbussmann / BatchTopK
Implementation of the BatchTopK activation function for training sparse autoencoders (SAEs)
☆42Updated last month
jannik-brinkmann / multilingual-features
Code for the paper "Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages" (N…
☆14Updated 2 months ago
EleutherAI / steering-llama3
☆29Updated 10 months ago
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆108Updated 4 months ago
nostalgebraist / transformer-utils
Utilities for the HuggingFace transformers library
☆68Updated 2 years ago
ArthurConmy / Automatic-Circuit-Discovery
☆227Updated 8 months ago
tdooms / bilinear-decomposition
Official repo for the paper "Weight-based Decomposition: A Case for Bilinear MLPs"
☆21Updated 7 months ago
EleutherAI / features-across-time
Understanding how features learned by neural networks evolve throughout training
☆35Updated 8 months ago
ckkissane / crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
☆55Updated 8 months ago
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆127Updated 2 years ago