science-of-finetuning/crosscoder_learning

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/science-of-finetuning/crosscoder_learning)

science-of-finetuning / crosscoder_learning

Modified to support crosscoder training.

☆27

Alternatives and similar repositories for crosscoder_learning

Users that are interested in crosscoder_learning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

science-of-finetuning / sparsity-artifacts-crosscoders
View on GitHub
Code for the "Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning" paper.
☆17Jul 6, 2026Updated 2 weeks ago
neelnanda-io / Crosscoders
View on GitHub
☆60Nov 19, 2024Updated last year
tilde-research / sieve
View on GitHub
Applying SAEs for fine-grained control
☆27Dec 15, 2024Updated last year
curt-tigges / crosslayer-coding
View on GitHub
☆18Jul 9, 2025Updated last year
jacobdunefsky / llm-steering-opt
View on GitHub
Tools for optimizing steering vectors in LLMs.
☆22Apr 10, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
oclivegriffin / crosscode
View on GitHub
A library for training crosscoders
☆17May 28, 2025Updated last year
koayon / atp_star
View on GitHub
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
☆20Jan 19, 2025Updated last year
kdu4108 / context-vs-prior-finetuning
View on GitHub
☆15May 27, 2025Updated last year
efarrell1 / train_sparse_autoencoder
View on GitHub
Trains Sparse Autoencoders based on outputs from language models
☆11Oct 7, 2024Updated last year
jkminder / SALSA-CLRS
View on GitHub
Implementation of "SALSA-CLRS: A Sparse and Scalable Benchmark for Algorithmic Reasoning". SALSA-CLRS is an extension to the original clr…
☆22Nov 21, 2023Updated 2 years ago
EleutherAI / delphi
View on GitHub
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆268Updated this week
saprmarks / dictionary_learning
View on GitHub
☆428Aug 21, 2025Updated 11 months ago
TransluceAI / introspective-interp
View on GitHub
Repository for "Training Language Models To Explain Their Own Computations"
☆23Jul 7, 2026Updated 2 weeks ago
bartbussmann / BatchTopK
View on GitHub
Implementation of the BatchTopK activation function for training sparse autoencoders (SAEs)
☆67Jul 24, 2025Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
saprmarks / feature-circuits
View on GitHub
☆223Oct 14, 2025Updated 9 months ago
cloneofsimo / minSAE
View on GitHub
☆30Dec 2, 2024Updated last year
curt-tigges / probity
View on GitHub
☆19Apr 10, 2025Updated last year
MaheepChaudhary / SAE-Ravel
View on GitHub
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…
☆13Jan 26, 2025Updated last year
ApolloResearch / e2e_sae
View on GitHub
Sparse Autoencoder Training Library
☆58May 1, 2025Updated last year
jammastergirish / LLMProbe
View on GitHub
☆20Dec 10, 2025Updated 7 months ago
dmis-lab / Monet
View on GitHub
[ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers
☆79Jun 23, 2025Updated last year
kutay25 / ai-safety-alignment-camps
View on GitHub
An repository of 2025-2026 AI Safety and Alignment programs, camps, and workshops.
☆22May 18, 2025Updated last year
ckkissane / crosscoder-model-diff-replication
View on GitHub
Open source replication of Anthropic's Crosscoders for Model Diffing
☆68Oct 27, 2024Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
adamkarvonen / dictionary_learning_demo
View on GitHub
☆26Aug 23, 2025Updated 11 months ago
ndif-team / nnsight
View on GitHub
The nnsight package enables interpreting and manipulating the internals of deep learned models.
☆998Updated this week
ARBORproject / arborproject.github.io
View on GitHub
☆86Feb 25, 2025Updated last year
felixbinder / introspection_self_prediction
View on GitHub
Code for experiments on self-prediction as a way to measure introspection in LLMs
☆16Dec 10, 2024Updated last year
edeyneka / pdf-reader-extension
View on GitHub
☆13Mar 9, 2025Updated last year
adamkarvonen / SAE_BoardGameEval
View on GitHub
☆25Jan 28, 2025Updated last year
EleutherAI / sparsify
View on GitHub
Sparsify transformers with SAEs and transcoders
☆734Updated this week
goodfire-ai / param-decomp
View on GitHub
Parameter Decomposition
☆135Updated this week
janphilippfranken / sami
View on GitHub
Self-Supervised Alignment with Mutual Information
☆20May 24, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Psi-Prod / ppx_system
View on GitHub
ppx_system is a syntax extension to known operating system at compile time
☆12May 9, 2023Updated 3 years ago
UlisseMini / procgen-tools
View on GitHub
Tools for running experiments on RL agents in procgen environments
☆20Apr 5, 2024Updated 2 years ago
shengliu66 / LC
View on GitHub
Official Implementation of Avoiding spurious correlations via logit correction
☆17May 6, 2023Updated 3 years ago
UKGovernmentBEIS / hibayes
View on GitHub
☆53May 17, 2026Updated 2 months ago
callummcdougall / sae_vis
View on GitHub
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆267Feb 27, 2026Updated 4 months ago
google-deepmind / mishax
View on GitHub
☆156Updated this week
safety-research / believe-it-or-not
View on GitHub
Code and data for editing model beliefs with SDF and other methods, and for evaluating the depth of the implanted beliefs.
☆16Oct 23, 2025Updated 9 months ago