efarrell1/train_sparse_autoencoder

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/efarrell1/train_sparse_autoencoder)

efarrell1 / train_sparse_autoencoder

Trains Sparse Autoencoders based on outputs from language models

☆11

Alternatives and similar repositories for train_sparse_autoencoder

Users that are interested in train_sparse_autoencoder are comparing it to the libraries listed below

Sorting:

duykhuongnguyen / MAT-Steer
View on GitHub
☆15Aug 19, 2025Updated 6 months ago
fiveai / understanding_safety_finetuning
View on GitHub
Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)
☆12Oct 31, 2024Updated last year
tim-lawson / mlsae
View on GitHub
Multi-Layer Sparse Autoencoders (ICLR 2025)
☆29Feb 6, 2026Updated 3 weeks ago
jacobdunefsky / llm-steering-opt
View on GitHub
Tools for optimizing steering vectors in LLMs.
☆20Apr 10, 2025Updated 10 months ago
Butanium / tiny-activation-dashboard
View on GitHub
A tiny easily hackable implementation of a feature dashboard.
☆15Oct 21, 2025Updated 4 months ago
MikaStars39 / FeatureAlignment
View on GitHub
FeatureAlignment = Alignment + Mechanistic Interpretability
☆34Mar 8, 2025Updated 11 months ago
ckkissane / crosscoder-model-diff-replication
View on GitHub
Open source replication of Anthropic's Crosscoders for Model Diffing
☆64Oct 27, 2024Updated last year
FlyingPumba / InterpBench
View on GitHub
A benchmark for mechanistic discovery of circuits in Transformers
☆16Dec 15, 2024Updated last year
princeton-nlp / Edge-Pruning
View on GitHub
[NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".
☆66Aug 15, 2025Updated 6 months ago
matchten / LoRA-Models-for-SAEs
View on GitHub
Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"
☆17Mar 31, 2025Updated 10 months ago
JasonGross / guarantees-based-mechanistic-interpretability
View on GitHub
☆17Updated this week
noanabeshima / matryoshka-saes
View on GitHub
☆27Nov 28, 2024Updated last year
adamkarvonen / dictionary_learning_demo
View on GitHub
☆24Aug 23, 2025Updated 6 months ago
coding-coworking-club / python-2020-spring
View on GitHub
ccClub Python 2020 Spring
☆15Mar 27, 2020Updated 5 years ago
science-of-finetuning / crosscoder_learning
View on GitHub
Modified to support crosscoder training.
☆25Feb 4, 2026Updated 3 weeks ago
ApolloResearch / e2e_sae
View on GitHub
Sparse Autoencoder Training Library
☆55May 1, 2025Updated 9 months ago
adamkarvonen / SAEBench
View on GitHub
☆150Dec 30, 2025Updated last month
ordavid-s / snmf-mlp-decomposition
View on GitHub
☆13Oct 5, 2025Updated 4 months ago
ilsilfverskiold / ai-personalized-tech-reports-discord
View on GitHub
Build an AI bot in Discord to serve user's personalized reports on what's up in tech
☆28Sep 14, 2025Updated 5 months ago
Aaquib111 / edge-attribution-patching
View on GitHub
Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"
☆47May 31, 2024Updated last year
Mattral / mattral
View on GitHub
my profile readme
☆14Updated this week
chanind / linear-relational
View on GitHub
Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch
☆10Aug 7, 2024Updated last year
felixbinder / introspection_self_prediction
View on GitHub
Code for experiments on self-prediction as a way to measure introspection in LLMs
☆16Dec 10, 2024Updated last year
spectraldani / thindeepgps
View on GitHub
Reference implementation of Thin and Deep Gaussian Processes (NeurIPS 2023)
☆14Nov 25, 2024Updated last year
ag8 / sha-transformer
View on GitHub
☆12Jul 8, 2024Updated last year
tim-lawson / skip-middle
View on GitHub
Learning to Skip the Middle Layers of Transformers
☆17Aug 7, 2025Updated 6 months ago
catid / cuda_float_compress
View on GitHub
Python package for compressing floating-point PyTorch tensors
☆13Jul 22, 2024Updated last year
GokuMohandas / SELU
View on GitHub
🤖 Implementation of Self Normalizing Networks (SNN) in PyTorch.
☆12Jun 19, 2017Updated 8 years ago
kyolebu / janestreet-gpumode-hackathon
View on GitHub
1st Place Team Crane: @aswinkumar1999 @rathull @kyolebu
☆29Sep 8, 2025Updated 5 months ago
wrudman / NOTICE
View on GitHub
☆13Apr 10, 2025Updated 10 months ago
Surrey-EEEM071-CVDL / CourseWork
View on GitHub
The course work repo for UoSurrey EEEM071 (2023 Spring)
☆11May 9, 2023Updated 2 years ago
tonychenxyz / vit-interpret
View on GitHub
Official implementation of "Interpreting and Controlling Vision Foundation Models via Text Explanations"
☆14May 29, 2024Updated last year
CSHaitao / CaseGen
View on GitHub
A Benchmark for Multi-Stage Legal Case Documents Generation
☆15Feb 24, 2025Updated last year
Trustworthy-ML-Lab / Describe-and-Dissect
View on GitHub
[TMLR 25] An automated method for explaining complex neuron behaviors in deep vision models using large language models
☆10Feb 20, 2025Updated last year
wangitu / CherryQ
View on GitHub
☆14May 21, 2024Updated last year
VanessB / mutinfo
View on GitHub
Mutual information estimators and benchmarks
☆14Updated this week
coffee4j / coffee4j
View on GitHub
A Java-based framework for combinatorial test input generation, fault characterization and automated test execution.
☆11Jan 22, 2024Updated 2 years ago
Hamme122 / gaussian-flow
View on GitHub
Unofficial implementation of "Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle"
☆13Jul 3, 2024Updated last year
zjunlp / CaKE
View on GitHub
[EMNLP 2025] Circuit-Aware Editing Enables Generalizable Knowledge Learners
☆18Nov 17, 2025Updated 3 months ago