oli-clive-griffin / crosscodeLinks

A library for training crosscoders

☆12

Alternatives and similar repositories for crosscode

Users that are interested in crosscode are comparing it to the libraries listed below

Sorting:

Butanium / tiny-activation-dashboard
A tiny easily hackable implementation of a feature dashboard.
☆15Updated last month
ckkissane / crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
☆60Updated last year
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆55Updated 6 months ago
amack315 / unsupervised-steering-vectors
☆36Updated last year
neelnanda-io / Crosscoders
☆56Updated last year
koayon / atp_star
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
☆20Updated 10 months ago
jbloomAus / SAEDashboard
☆79Updated last month
UFO-101 / auto-circuit
A library for efficient patching and automatic circuit discovery.
☆80Updated 4 months ago
tilde-research / activault
Engine for collecting, uploading, and downloading model activations
☆24Updated 7 months ago
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆225Updated last week
HugoFry / mats_sae_training_for_ViTs
☆22Updated last year
google-deepmind / mishax
☆143Updated 2 months ago
JasonGross / guarantees-based-mechanistic-interpretability
☆17Updated last week
ApolloResearch / apd
Attribution-based Parameter Decomposition
☆31Updated 5 months ago
tilde-research / sieve
Applying SAEs for fine-grained control
☆24Updated 11 months ago
science-of-finetuning / crosscoder_learning
Modified to support crosscoder training.
☆24Updated last month
hijohnnylin / automated-interpretability
☆16Updated last month
neelnanda-io / 1L-Sparse-Autoencoder
☆132Updated 2 years ago
ckkissane / sae-transfer
Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"
☆13Updated last year
yash-srivastava19 / arrakis
Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.
☆31Updated 7 months ago
JoshEngels / MultiDimensionalFeatures
Code for reproducing our paper "Not All Language Model Features Are Linear"
☆84Updated 11 months ago
Butanium / nnterp
Unified access to Large Language Model modules using NNsight
☆59Updated last week
mishajw / repeng
Experiments with representation engineering
☆13Updated last year
hijohnnylin / neuronpedia-scorer
☆17Updated last year
timaeus-research / devinterp
Tools for studying developmental interpretability in neural networks.
☆114Updated 4 months ago
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆130Updated 3 years ago
EleutherAI / elk-generalization
Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…
☆28Updated last year
callummcdougall / sae_visualizer
☆29Updated last year
curt-tigges / crosslayer-coding
☆15Updated 4 months ago
curt-tigges / probity
☆19Updated 7 months ago