OpenMOSS/Language-Model-SAEs

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/OpenMOSS/Language-Model-SAEs)

OpenMOSS / Language-Model-SAEs

Performant framework for training, analyzing and visualizing Sparse Autoencoders (SAEs) and their frontier variants.

☆209

Alternatives and similar repositories for Language-Model-SAEs

Users that are interested in Language-Model-SAEs are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ckkissane / sae-transfer
View on GitHub
Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"
☆13Jul 18, 2024Updated last year
decoderesearch / SAELens
View on GitHub
Training Sparse Autoencoders on Language Models
☆1,295Mar 19, 2026Updated 3 weeks ago
ckkissane / crosscoder-model-diff-replication
View on GitHub
Open source replication of Anthropic's Crosscoders for Model Diffing
☆64Oct 27, 2024Updated last year
EleutherAI / delphi
View on GitHub
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆248Updated this week
EleutherAI / sparsify
View on GitHub
Sparsify transformers with SAEs and transcoders
☆704Updated this week
Wordpress hosting with auto-scaling on Cloudways • Ad
Fully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
adamkarvonen / SAEBench
View on GitHub
☆158Dec 30, 2025Updated 3 months ago
OpenMOSS / Lorsa
View on GitHub
☆29Nov 9, 2025Updated 5 months ago
jbloomAus / SAEDashboard
View on GitHub
☆92Dec 18, 2025Updated 3 months ago
callummcdougall / sae_vis
View on GitHub
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆252Feb 27, 2026Updated last month
adamkarvonen / dictionary_learning_demo
View on GitHub
☆25Aug 23, 2025Updated 7 months ago
slavachalnev / SAE-TS
View on GitHub
Improving Steering Vectors by Targeting Sparse Autoencoder Features
☆27Nov 20, 2024Updated last year
tim-lawson / mlsae
View on GitHub
Multi-Layer Sparse Autoencoders (ICLR 2025)
☆29Feb 6, 2026Updated 2 months ago
saprmarks / dictionary_learning
View on GitHub
☆407Aug 21, 2025Updated 7 months ago
ai-safety-foundation / sparse_autoencoder
View on GitHub
Sparse Autoencoder for Mechanistic Interpretability
☆296Jul 20, 2024Updated last year
DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
saprmarks / feature-circuits
View on GitHub
☆212Oct 14, 2025Updated 5 months ago
openai / sparse_autoencoder
View on GitHub
☆582Jul 19, 2024Updated last year
TransformerLensOrg / TransformerLens
View on GitHub
A library for mechanistic interpretability of GPT-style language models
☆3,272Updated this week
MaheepChaudhary / SAE-Ravel
View on GitHub
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…
☆13Jan 26, 2025Updated last year
ndif-team / nnsight
View on GitHub
The nnsight package enables interpreting and manipulating the internals of deep learned models.
☆885Updated this week
Butanium / tiny-activation-dashboard
View on GitHub
A tiny easily hackable implementation of a feature dashboard.
☆16Oct 21, 2025Updated 5 months ago
hijohnnylin / neuronpedia-scorer
View on GitHub
☆17Feb 14, 2024Updated 2 years ago
science-of-finetuning / sparsity-artifacts-crosscoders
View on GitHub
Code for the "Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning" paper.
☆17Nov 21, 2025Updated 4 months ago
MikaStars39 / FeatureAlignment
View on GitHub
FeatureAlignment = Alignment + Mechanistic Interpretability
☆35Mar 8, 2025Updated last year
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
saprmarks / geometry-of-truth
View on GitHub
☆104Aug 8, 2024Updated last year
yuzhaouoe / SAE-based-representation-engineering
View on GitHub
[NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
☆79Jan 16, 2026Updated 2 months ago
adamkarvonen / SAE_BoardGameEval
View on GitHub
☆24Jan 28, 2025Updated last year
goodfire-ai / r1-interpretability
View on GitHub
Open source interpretability artefacts for R1.
☆172Apr 21, 2025Updated 11 months ago
noanabeshima / matryoshka-saes
View on GitHub
☆28Nov 28, 2024Updated last year
zepingyu0512 / awesome-SAE
View on GitHub
awesome SAE papers
☆73May 24, 2025Updated 10 months ago
open-nlplab / fastIE
View on GitHub
Information Extraction related tools and models
☆10Mar 16, 2023Updated 3 years ago
science-of-finetuning / crosscoder_learning
View on GitHub
Modified to support crosscoder training.
☆26Feb 4, 2026Updated 2 months ago
curt-tigges / probity
View on GitHub
☆20Apr 10, 2025Updated 11 months ago
DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
TransformerLensOrg / CircuitsVis
View on GitHub
Mechanistic Interpretability Visualizations using React
☆338Dec 18, 2024Updated last year
oclivegriffin / crosscode
View on GitHub
A library for training crosscoders
☆16May 28, 2025Updated 10 months ago
OpenMOSS / Say-I-Dont-Know
View on GitHub
[ICML'2024] Can AI Assistants Know What They Don't Know?
☆85Feb 5, 2024Updated 2 years ago
duykhuongnguyen / MAT-Steer
View on GitHub
☆18Aug 19, 2025Updated 7 months ago
HoagyC / sparse_coding
View on GitHub
Using sparse coding to find distributed representations used by neural networks.
☆298Nov 10, 2023Updated 2 years ago
princeton-nlp / Edge-Pruning
View on GitHub
[NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".
☆66Aug 15, 2025Updated 7 months ago
efarrell1 / train_sparse_autoencoder
View on GitHub
Trains Sparse Autoencoders based on outputs from language models
☆11Oct 7, 2024Updated last year