neelnanda-io / 1L-Sparse-AutoencoderLinks

☆132

Alternatives and similar repositories for 1L-Sparse-Autoencoder

Users that are interested in 1L-Sparse-Autoencoder are comparing it to the libraries listed below

Sorting:

callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆232Updated last year
ArthurConmy / Automatic-Circuit-Discovery
☆260Updated last year
TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆303Updated last year
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆234Updated last week
neelnanda-io / Crosscoders
☆58Updated last year
saprmarks / dictionary_learning
☆373Updated 4 months ago
ai-safety-foundation / sparse_autoencoder
Sparse Autoencoder for Mechanistic Interpretability
☆286Updated last year
jacobdunefsky / transcoder_circuits
☆192Updated last year
HoagyC / sparse_coding
Using sparse coding to find distributed representations used by neural networks.
☆289Updated 2 years ago
redwoodresearch / Easy-Transformer
☆134Updated last year
ckkissane / crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
☆63Updated last year
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆56Updated 7 months ago
saprmarks / feature-circuits
☆196Updated 2 months ago
adamkarvonen / SAEBench
☆136Updated last month
UFO-101 / auto-circuit
A library for efficient patching and automatic circuit discovery.
☆82Updated 5 months ago
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆131Updated 3 years ago
KihoPark / linear_rep_geometry
☆112Updated 10 months ago
callummcdougall / sae_visualizer
☆29Updated last year
collin-burns / discovering_latent_knowledge
☆283Updated last year
wesg52 / sparse-probing-paper
Sparse probing paper full code.
☆66Updated 2 years ago
ARBORproject / arborproject.github.io
☆83Updated 9 months ago
science-of-finetuning / crosscoder_learning
Modified to support crosscoder training.
☆25Updated 2 months ago
ApolloResearch / apd
Attribution-based Parameter Decomposition
☆33Updated 6 months ago
AlignmentResearch / tuned-lens
Tools for understanding how transformer predictions are built layer-by-layer
☆554Updated 4 months ago
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆134Updated 10 months ago
EleutherAI / concept-erasure
Erasing concepts from neural representations with provable guarantees
☆240Updated 10 months ago
nrimsky / CAA
Steering Llama 2 with Contrastive Activation Addition
☆201Updated last year
Butanium / nnterp
Unified access to Large Language Model modules using NNsight
☆70Updated last month
amack315 / unsupervised-steering-vectors
☆36Updated last year
EleutherAI / sparsify
Sparsify transformers with SAEs and transcoders
☆673Updated last week