shuyhere/Awesome-Sparse-Autoencoder

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/shuyhere/Awesome-Sparse-Autoencoder)

shuyhere / Awesome-Sparse-Autoencoder

Collection of Reverse Engineering in Large Model

☆35

Alternatives and similar repositories for Awesome-Sparse-Autoencoder

Users that are interested in Awesome-Sparse-Autoencoder are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

MaheepChaudhary / SAE-Ravel
View on GitHub
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…
☆13Jan 26, 2025Updated last year
zjunlp / PitfallsKnowledgeEditing
View on GitHub
[ICLR 2024] Unveiling the Pitfalls of Knowledge Editing for Large Language Models
☆22Jun 13, 2024Updated 2 years ago
EleutherAI / delphi
View on GitHub
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆267Updated this week
evandez / relations
View on GitHub
How do transformer LMs encode relations?
☆59Feb 24, 2024Updated 2 years ago
cooperleong00 / Awesome-LLM-Interpretability
View on GitHub
A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..
☆308Jan 22, 2026Updated 6 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
openai / sparse_autoencoder
View on GitHub
☆596Jul 19, 2024Updated 2 years ago
TransluceAI / .github
View on GitHub
☆19Dec 12, 2025Updated 7 months ago
hartvigsen-group / composable-interventions
View on GitHub
☆29Feb 27, 2025Updated last year
HoagyC / sparse_coding
View on GitHub
Using sparse coding to find distributed representations used by neural networks.
☆306Nov 10, 2023Updated 2 years ago
science-of-finetuning / sparsity-artifacts-crosscoders
View on GitHub
Code for the "Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning" paper.
☆17Jul 6, 2026Updated 2 weeks ago
nightdessert / Retrieval_Head
View on GitHub
open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality
☆241Aug 2, 2024Updated last year
Jometeorie / MultiHopShortcuts
View on GitHub
Reproduction Code for Paper "Investigating Multi-Hop Factual Shortcuts in Knowledge Editing of Large Language Models"
☆14Jun 1, 2024Updated 2 years ago
MikaStars39 / FeatureAlignment
View on GitHub
FeatureAlignment = Alignment + Mechanistic Interpretability
☆35Mar 8, 2025Updated last year
francescortu / comp-mech
View on GitHub
Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals; ACL 2024
☆13May 24, 2024Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
laihuiyuan / multilingual-tst
View on GitHub
Multilingual Pre-training with Language and Task Adaptation for Multilingual Text Style Transfer (ACL 2022)
☆10Sep 22, 2022Updated 3 years ago
callummcdougall / sae_vis
View on GitHub
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆267Feb 27, 2026Updated 4 months ago
alestolfo / lm-arithmetic
View on GitHub
Code for the paper "A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis"
☆20Jun 12, 2025Updated last year
KunKuang / Decorrelated-Weighted-Regression
View on GitHub
Stable Prediction with Model Misspecification and Agnostic Distribution Shift
☆26Apr 26, 2020Updated 6 years ago
baixianghuang / editing-attack
View on GitHub
Code and dataset for the paper: "Can Editing LLMs Inject Harm?" [AAAI'26]
☆21Dec 26, 2025Updated 6 months ago
zepingyu0512 / neuron-attribution
View on GitHub
code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models
☆52Nov 17, 2024Updated last year
Incredible88 / FinBERT-FOMC
View on GitHub
This is a fine-tuned FinBERT model by sentiment focus, the data used FOMC minutes.
☆10Oct 26, 2024Updated last year
trestad / Factual-Recall-Mechanism
View on GitHub
The code for paper Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models.
☆13Apr 10, 2024Updated 2 years ago
wanghaisheng / clinical-decision-support-book
View on GitHub
Survey of the State of the Art in structural clinical knowledge
☆11Feb 7, 2015Updated 11 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
AISG-Technology-Team / AISG-Online-Safety-Challenge-Submission-Guide
View on GitHub
Submission Guide + Discussion Board for AI Singapore Online Safety Prize Challenge
☆14Mar 20, 2024Updated 2 years ago
jbloomAus / SAEDashboard
View on GitHub
☆109May 23, 2026Updated 2 months ago
zjunlp / knowledge-rumination
View on GitHub
[EMNLP 2023] Knowledge Rumination for Pre-trained Language Models
☆17Jun 29, 2023Updated 3 years ago
skolouri / TopoTrans
View on GitHub
TopoTrans: Optimal Transport meets Topological Data Analysis
☆14Apr 20, 2023Updated 3 years ago
liaolea / TransPrune
View on GitHub
[CVPR 2026] TransPrune: Token Transition Pruning for Efficient Large Vision-Language Model
☆17Feb 23, 2026Updated 5 months ago
JShollaj / awesome-llm-interpretability
View on GitHub
A curated list of Large Language Model (LLM) Interpretability resources.
☆1,629Feb 24, 2026Updated 4 months ago
rycolab / artificial-languages
View on GitHub
☆12Apr 19, 2022Updated 4 years ago
susumuota / nano-askllm
View on GitHub
Unofficial implementation of the Ask-LLM paper 'How to Train Data-Efficient LLMs', arXiv:2402.09668.
☆12Jun 19, 2024Updated 2 years ago
HanSolo9682 / CounterCurate
View on GitHub
This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.
☆19Jun 27, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
mariomeissner / lightning-hydra-transformers
View on GitHub
My take on how you should organize your transformer experiments.
☆13Apr 13, 2022Updated 4 years ago
EleutherAI / sparsify
View on GitHub
Sparsify transformers with SAEs and transcoders
☆733Updated this week
zer0int / CLIP-SAE-finetune
View on GitHub
Sparse Autoencoders (SAE) vs CLIP fine-tuning fun.
☆18Dec 19, 2024Updated last year
hscells / pybool_ir
View on GitHub
Toolkit for domain-specific information retrieval experimentation
☆19May 18, 2026Updated 2 months ago
loispaulin / Sliced-Optimal-Transport-Sampling
View on GitHub
☆16Sep 27, 2023Updated 2 years ago
javiferran / sae_entities
View on GitHub
☆78Mar 6, 2025Updated last year
ApolloResearch / apd
View on GitHub
Attribution-based Parameter Decomposition
☆35Jun 11, 2025Updated last year