tim-lawson / mlsaeLinks

Multi-Layer Sparse Autoencoders (ICLR 2025)

☆27

Alternatives and similar repositories for mlsae

Users that are interested in mlsae are comparing it to the libraries listed below

Sorting:

dmis-lab / Monet
[ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers
☆74Updated 5 months ago
ckkissane / crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
☆63Updated last year
JoshEngels / MultiDimensionalFeatures
Code for reproducing our paper "Not All Language Model Features Are Linear"
☆84Updated last year
ericwtodd / function_vectors
Function Vectors in Large Language Models (ICLR 2024)
☆186Updated 7 months ago
stanfordnlp / axbench
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
☆151Updated 5 months ago
bartbussmann / matryoshka_sae
☆53Updated 10 months ago
UFO-101 / auto-circuit
A library for efficient patching and automatic circuit discovery.
☆80Updated 4 months ago
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆231Updated last week
peterljq / Parsimonious-Concept-Engineering
PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)
☆41Updated last year
KihoPark / linear_rep_geometry
☆110Updated 9 months ago
Aaquib111 / edge-attribution-patching
Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"
☆43Updated last year
AIRI-Institute / SAE-Reasoning
☆89Updated 8 months ago
jacobdunefsky / llm-steering-opt
Tools for optimizing steering vectors in LLMs.
☆15Updated 8 months ago
adamkarvonen / SAEBench
☆136Updated 3 weeks ago
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆55Updated 7 months ago
ahans30 / goldfish-loss
[NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs
☆93Updated last year
hamishivi / automated-instruction-selection
Exploration of automated dataset selection approaches at large scales.
☆50Updated 9 months ago
neelnanda-io / Crosscoders
☆58Updated last year
slavachalnev / SAE-TS
Improving Steering Vectors by Targeting Sparse Autoencoder Features
☆24Updated last year
Nix07 / finetuning
This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…
☆28Updated last month
IBM / sae-steering
Code to enable layer-level steering in LLMs using sparse auto encoders
☆28Updated 2 months ago
RobertCsordas / moeut
☆89Updated last year
adamkarvonen / SAE_BoardGameEval
☆23Updated 10 months ago
shuyhere / Awesome-Sparse-Autoencoder
Collection of Reverse Engineering in Large Model
☆36Updated 11 months ago
princeton-nlp / Edge-Pruning
[NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".
☆62Updated 3 months ago
MadryLab / DsDm
☆51Updated last year
montemac / activation_additions
Algebraic value editing in pretrained language models
☆66Updated 2 years ago
mcleish7 / gemstone-scaling-laws
Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)
☆30Updated 2 months ago
saprmarks / geometry-of-truth
☆95Updated last year
socialfoundations / tttlm
Test-time-training on nearest neighbors for large language models
☆48Updated last year