bartbussmann/matryoshka_sae

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/bartbussmann/matryoshka_sae)

bartbussmann / matryoshka_sae

☆72

Alternatives and similar repositories for matryoshka_sae

Users that are interested in matryoshka_sae are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

noanabeshima / matryoshka-saes
View on GitHub
☆33Nov 28, 2024Updated last year
adamkarvonen / SAEBench
View on GitHub
☆179May 1, 2026Updated 2 months ago
slavachalnev / SAE-TS
View on GitHub
Improving Steering Vectors by Targeting Sparse Autoencoder Features
☆29Nov 20, 2024Updated last year
EleutherAI / delphi
View on GitHub
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆269Updated this week
adamkarvonen / dictionary_learning_demo
View on GitHub
☆26Aug 23, 2025Updated 11 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
decoderesearch / SAELens
View on GitHub
Training Sparse Autoencoders on Language Models
☆1,487Updated this week
saprmarks / dictionary_learning
View on GitHub
☆428Aug 21, 2025Updated 11 months ago
PKU-Alignment / SAE-V
View on GitHub
[ICML 2025 Poster] SAE-V: Interpreting Multimodal Models for Enhanced Alignment
☆17Jun 5, 2025Updated last year
bartbussmann / BatchTopK
View on GitHub
Implementation of the BatchTopK activation function for training sparse autoencoders (SAEs)
☆67Jul 24, 2025Updated last year
ckkissane / crosscoder-model-diff-replication
View on GitHub
Open source replication of Anthropic's Crosscoders for Model Diffing
☆68Oct 27, 2024Updated last year
efarrell1 / train_sparse_autoencoder
View on GitHub
Trains Sparse Autoencoders based on outputs from language models
☆11Oct 7, 2024Updated last year
tilde-research / sieve
View on GitHub
Applying SAEs for fine-grained control
☆27Dec 15, 2024Updated last year
kim-dahye / steerers
View on GitHub
☆19May 19, 2025Updated last year
nickjiang2378 / interp-embed
View on GitHub
A toolkit for embedding text datasets with sparse autoencoders
☆30Mar 24, 2026Updated 4 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
EleutherAI / sparsify
View on GitHub
Sparsify transformers with SAEs and transcoders
☆735Jul 20, 2026Updated last week
saprmarks / feature-circuits
View on GitHub
☆223Oct 14, 2025Updated 9 months ago
ApolloResearch / e2e_sae
View on GitHub
Sparse Autoencoder Training Library
☆58May 1, 2025Updated last year
MaheepChaudhary / SAE-Ravel
View on GitHub
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…
☆13Jan 26, 2025Updated last year
dynamical-inference / patchsae
View on GitHub
Implementation of PatchSAE as presented in "Sparse autoencoders reveal selective remapping of visual concepts during adaptation"
☆33Apr 22, 2026Updated 3 months ago
tilde-research / activault
View on GitHub
Engine for collecting, uploading, and downloading model activations
☆30Apr 2, 2025Updated last year
ml-research / ActivationReasoning
View on GitHub
☆15May 21, 2026Updated 2 months ago
shengliu66 / LC
View on GitHub
Official Implementation of Avoiding spurious correlations via logit correction
☆17May 6, 2023Updated 3 years ago
IBM / sae-steering
View on GitHub
Code to enable layer-level steering in LLMs using sparse auto encoders
☆34Sep 18, 2025Updated 10 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
lasr-spelling / sae-spelling
View on GitHub
Code for the paper "A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders"
☆16Dec 28, 2025Updated 7 months ago
mpsae / MP-SAE
View on GitHub
☆17May 19, 2026Updated 2 months ago
Tyrion58 / T3D
View on GitHub
The official implementation of T3D: T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative O…
☆25Jul 10, 2026Updated 2 weeks ago
rmovva / HypotheSAEs
View on GitHub
HypotheSAEs: hypothesizing interpretable relationships in text datasets using sparse autoencoders. https://arxiv.org/abs/2502.04382
☆92Updated this week
neelnanda-io / Crosscoders
View on GitHub
☆60Nov 19, 2024Updated last year
METR / hcast-public
View on GitHub
☆22Jul 6, 2026Updated 3 weeks ago
EvolvingLMMs-Lab / multimodal-sae
View on GitHub
[ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.
☆199Sep 26, 2025Updated 10 months ago
aypan17 / latentqa
View on GitHub
☆34Nov 16, 2025Updated 8 months ago
Trustworthy-ML-Lab / CB-LLMs
View on GitHub
[ICLR 25] A novel framework for building intrinsically interpretable LLMs with human-understandable concepts to ensure safety, reliabilit…
☆33Feb 5, 2026Updated 5 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
Model-GLUE / Model-GLUE
View on GitHub
☆18Aug 19, 2024Updated last year
mandyyyyii / east
View on GitHub
☆19Aug 4, 2025Updated 11 months ago
openai / sparse_autoencoder
View on GitHub
☆597Jul 19, 2024Updated 2 years ago
WolodjaZ / MSAE
View on GitHub
Interpreting CLIP with Hierarchical Sparse Autoencoders (ICML 2025)
☆28Jan 17, 2026Updated 6 months ago
explanare / ravel
View on GitHub
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
☆58Oct 30, 2025Updated 8 months ago
ssfgunner / VL-SAE
View on GitHub
[NeurIPS 2025] This is the official repository for VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Se…
☆15Oct 29, 2025Updated 9 months ago
dtch1997 / steering-bench
View on GitHub
Official codebase for "Analyzing the Generalization and Reliability of Steering Vectors"
☆22Dec 14, 2024Updated last year