science-of-finetuning / dictionary_learningLinks

Modified to support crosscoder training.

☆20

Alternatives and similar repositories for dictionary_learning

Users that are interested in dictionary_learning are comparing it to the libraries listed below

Sorting:

saprmarks / feature-circuits
☆182Updated 3 months ago
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆193Updated this week
ckkissane / crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
☆57Updated 8 months ago
ArthurConmy / Automatic-Circuit-Discovery
☆231Updated 9 months ago
neelnanda-io / Crosscoders
☆47Updated 8 months ago
adamkarvonen / SAEBench
☆105Updated last month
jacobdunefsky / transcoder_circuits
☆146Updated 8 months ago
neelnanda-io / 1L-Sparse-Autoencoder
☆123Updated last year
curt-tigges / probity
☆15Updated 3 months ago
saprmarks / dictionary_learning
☆315Updated this week
callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆206Updated 7 months ago
UFO-101 / auto-circuit
A library for efficient patching and automatic circuit discovery.
☆70Updated 2 months ago
KihoPark / linear_rep_geometry
☆100Updated 5 months ago
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆53Updated 2 months ago
Butanium / nnterp
Unified access to Large Language Model modules using NNsight
☆21Updated this week
nrimsky / CAA
Steering Llama 2 with Contrastive Activation Addition
☆164Updated last year
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆115Updated 4 months ago
slavachalnev / SAE-TS
Improving Steering Vectors by Targeting Sparse Autoencoder Features
☆23Updated 7 months ago
Aaquib111 / edge-attribution-patching
Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"
☆40Updated last year
HoagyC / sparse_coding
Using sparse coding to find distributed representations used by neural networks.
☆259Updated last year
redwoodresearch / Easy-Transformer
☆121Updated 11 months ago
ApolloResearch / apd
Attribution-based Parameter Decomposition
☆27Updated last month
curt-tigges / crosslayer-coding
☆13Updated last week
tilde-research / activault
Engine for collecting, uploading, and downloading model activations
☆20Updated 3 months ago
tilde-research / sieve
Applying SAEs for fine-grained control
☆22Updated 7 months ago
ai-safety-foundation / sparse_autoencoder
Sparse Autoencoder for Mechanistic Interpretability
☆255Updated 11 months ago
JoshEngels / MultiDimensionalFeatures
Code for reproducing our paper "Not All Language Model Features Are Linear"
☆77Updated 7 months ago
bartbussmann / matryoshka_sae
☆35Updated 6 months ago
tim-lawson / mlsae
Multi-Layer Sparse Autoencoders (ICLR 2025)
☆22Updated 5 months ago
jbloomAus / SAEDashboard
☆60Updated last week