IParraMartin / Sparse-AutoencoderLinks

A PyTorch implementation of a Sparse Auto Encoder (SAE) using MSE loss and KL Divergence penalty

☆24

Alternatives and similar repositories for Sparse-Autoencoder

Users that are interested in Sparse-Autoencoder are comparing it to the libraries listed below

Sorting:

jonhue / activeft
PyTorch library for Active Fine-Tuning
☆93Updated last month
taufeeque9 / codebook-features
Sparse and discrete interpretability tool for neural networks
☆64Updated last year
ai-safety-foundation / sparse_autoencoder
Sparse Autoencoder for Mechanistic Interpretability
☆283Updated last year
multimodal-interpretability / maia
Official implementation of MAIA, A Multimodal Automated Interpretability Agent
☆93Updated 2 weeks ago
FarnoushRJ / MambaLRP
[NeurIPS 2024] Official implementation of the paper "MambaLRP: Explaining Selective State Space Sequence Models" 🐍
☆45Updated last year
uclaml / MoE
Towards Understanding the Mixture-of-Experts Layer in Deep Learning
☆32Updated last year
CLAIRE-Labo / quantile-reward-policy-optimization
Official codebase for "Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions" (Matrenok …
☆27Updated 3 months ago
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆127Updated 8 months ago
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆55Updated 6 months ago
tommasomncttn / mergenetic
Flexible library for merging large language models (LLMs) via evolutionary optimization (ACL 2025 Demo).
☆91Updated 3 months ago
facebookresearch / ExploreToM
Code for ExploreTom
☆86Updated 4 months ago
Zyphra / BlackMamba
Code repository for Black Mamba
☆258Updated last year
ahans30 / goldfish-loss
[NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs
☆92Updated 11 months ago
shengliu66 / ICV
Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
☆190Updated 8 months ago
JoshEngels / MultiDimensionalFeatures
Code for reproducing our paper "Not All Language Model Features Are Linear"
☆83Updated 11 months ago
multimodal-interpretability / FIND
Official implementation of FIND (NeurIPS '23) Function Interpretation Benchmark and Automated Interpretability Agents
☆51Updated last year
anthropics / sleeper-agents-paper
Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".
☆120Updated last year
SamsungSAILMontreal / nino
Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [ICLR 2025]
☆24Updated 3 weeks ago
sibyl-dev / Explingo
Explaining ML models using LLMs
☆23Updated last year
KindXiaoming / physics_of_skill_learning
We study toy models of skill learning.
☆31Updated 9 months ago
dmis-lab / Monet
[ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers
☆73Updated 4 months ago
vdlad / Remarkable-Robustness-of-LLMs
Codebase the paper "The Remarkable Robustness of LLMs: Stages of Inference?"
☆19Updated 4 months ago
epfml / llm-baselines
nanoGPT-like codebase for LLM training
☆110Updated this week
BorealisAI / flora-opt
This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.
☆104Updated last year
OSU-NLP-Group / GrokkedTransformer
Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'
☆232Updated 3 months ago
apple / ml-entity-deduction-arena
☆34Updated last year
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆171Updated 4 months ago
stanfordnlp / axbench
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
☆138Updated 4 months ago
tanaymeh / mamba-train
A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM
☆59Updated last year
LoryPack / LLM-LieDetector
Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"
☆71Updated last year