jkminder / dictionary_learningLinks

Modified to support crosscoder training.

☆17

Alternatives and similar repositories for dictionary_learning

Users that are interested in dictionary_learning are comparing it to the libraries listed below

Sorting:

neelnanda-io / Crosscoders
☆43Updated 6 months ago
ckkissane / crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
☆55Updated 7 months ago
adamkarvonen / SAEBench
☆97Updated last month
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆181Updated this week
KihoPark / linear_rep_geometry
☆93Updated 3 months ago
neelnanda-io / 1L-Sparse-Autoencoder
☆121Updated last year
UFO-101 / auto-circuit
A library for efficient patching and automatic circuit discovery.
☆65Updated last month
jacobdunefsky / transcoder_circuits
☆124Updated 6 months ago
saprmarks / feature-circuits
☆171Updated last month
callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆200Updated 5 months ago
curt-tigges / probity
☆12Updated last month
tilde-research / sieve
Applying SAEs for fine-grained control
☆18Updated 5 months ago
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆52Updated last month
ArthurConmy / Automatic-Circuit-Discovery
☆223Updated 8 months ago
tilde-research / activault
Engine for collecting, uploading, and downloading model activations
☆18Updated 2 months ago
JoshEngels / MultiDimensionalFeatures
Code for reproducing our paper "Not All Language Model Features Are Linear"
☆75Updated 6 months ago
ApolloResearch / apd
Attribution-based Parameter Decomposition
☆23Updated this week
jbloomAus / SAEDashboard
☆50Updated last month
callummcdougall / sae_visualizer
☆28Updated last year
ARBORproject / arborproject.github.io
☆75Updated 3 months ago
Butanium / nnterp
A small package implementing some useful wrapping around nnsight
☆13Updated last month
HoagyC / sparse_coding
Using sparse coding to find distributed representations used by neural networks.
☆247Updated last year
TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆253Updated 5 months ago
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆103Updated 3 months ago
saprmarks / dictionary_learning
☆302Updated 2 weeks ago
ai-safety-foundation / sparse_autoencoder
Sparse Autoencoder for Mechanistic Interpretability
☆248Updated 10 months ago
amack315 / unsupervised-steering-vectors
☆31Updated last year
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆128Updated 3 weeks ago
Aaquib111 / edge-attribution-patching
Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"
☆34Updated last year
saprmarks / geometry-of-truth
☆83Updated 9 months ago