MaheepChaudhary/SAE-Ravel

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MaheepChaudhary/SAE-Ravel)

MaheepChaudhary / SAE-Ravel

Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the paper "Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small"

☆13

Alternatives and similar repositories for SAE-Ravel

Users that are interested in SAE-Ravel are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

explanare / ravel
View on GitHub
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
☆58Oct 30, 2025Updated 8 months ago
FlyingPumba / InterpBench
View on GitHub
A benchmark for mechanistic discovery of circuits in Transformers
☆17Dec 15, 2024Updated last year
ejnnr / cupbearer
View on GitHub
A library for mechanistic anomaly detection
☆22Jan 9, 2025Updated last year
EleutherAI / steering-llama3
View on GitHub
☆30Aug 2, 2024Updated last year
neuroexplicit-saar / Discover-then-Name
View on GitHub
Code for the paper: Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery. ECCV 2024.
☆59Nov 3, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
adamkarvonen / SAEBench
View on GitHub
☆179May 1, 2026Updated 2 months ago
tilde-research / sieve
View on GitHub
Applying SAEs for fine-grained control
☆27Dec 15, 2024Updated last year
koayon / atp_star
View on GitHub
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
☆20Jan 19, 2025Updated last year
thestephencasper / benchmarking_interpretability
View on GitHub
☆35Sep 13, 2023Updated 2 years ago
EleutherAI / elk-generalization
View on GitHub
Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…
☆33May 23, 2024Updated 2 years ago
Nix07 / belief_tracking
View on GitHub
This repository contains the code used for the experiments in the paper "Language Models use Lookbacks to Track Beliefs".
☆16Mar 14, 2026Updated 4 months ago
steering-vectors / steering-vectors
View on GitHub
Steering vectors for transformer language models in Pytorch / Huggingface
☆159Feb 21, 2025Updated last year
ApolloResearch / apd
View on GitHub
Attribution-based Parameter Decomposition
☆35Jun 11, 2025Updated last year
YanNeu / spurious_imagenet
View on GitHub
Spurious Features Everywhere - Large-Scale Detection of Harmful Spurious Features in ImageNet
☆32Aug 22, 2023Updated 2 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
Blkalkin / Optimal-TestTime
View on GitHub
☆10Mar 24, 2025Updated last year
hijohnnylin / neuronpedia-scorer
View on GitHub
☆17Feb 14, 2024Updated 2 years ago
leopoldwhite / Awesome-Inference-Time-Trustworthiness
View on GitHub
☆15May 15, 2026Updated 2 months ago
slavachalnev / SAE-TS
View on GitHub
Improving Steering Vectors by Targeting Sparse Autoencoder Features
☆29Nov 20, 2024Updated last year
tim-lawson / mlsae
View on GitHub
Multi-Layer Sparse Autoencoders (ICLR 2025)
☆30Feb 6, 2026Updated 5 months ago
UlisseMini / procgen-tools
View on GitHub
Tools for running experiments on RL agents in procgen environments
☆20Apr 5, 2024Updated 2 years ago
rabeehk / robust-nli
View on GitHub
☆17Jul 6, 2020Updated 6 years ago
RobertCsordas / onion_representations
View on GitHub
☆13Aug 19, 2024Updated last year
roeehendel / icl_task_vectors
View on GitHub
☆106Oct 30, 2023Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
EleutherAI / training-jacobian
View on GitHub
☆24Dec 11, 2024Updated last year
git-disl / Lisa
View on GitHub
This is the official code for the paper "Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning" (NeurIPS2024)
☆29Sep 10, 2024Updated last year
SERG-Delft / j2graph
View on GitHub
☆10Aug 25, 2020Updated 5 years ago
ArthurConmy / Automatic-Circuit-Discovery
View on GitHub
☆293Oct 1, 2024Updated last year
aleks-krasowski / PINNfluence
View on GitHub
☆17Jun 3, 2026Updated last month
saprmarks / dictionary_learning
View on GitHub
☆428Aug 21, 2025Updated 11 months ago
serre-lab / Horama
View on GitHub
☆19May 1, 2025Updated last year
shuyhere / Awesome-Sparse-Autoencoder
View on GitHub
Collection of Reverse Engineering in Large Model
☆35Jan 8, 2025Updated last year
EleutherAI / bergson
View on GitHub
Mapping out the "memory" of neural nets with data attribution
☆71Updated this week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
princeton-polaris-lab / Evaluating-Durable-Safeguards
View on GitHub
[ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs
☆13Jun 20, 2025Updated last year
annahedstroem / sanity-checks-revisited
View on GitHub
[NeurIPS XAIA & Springer] Code and notebooks to paper "A Fresh Look at Sanity Checks for Saliency Maps"
☆25Jul 12, 2024Updated 2 years ago
rmin2000 / adv_tracing
View on GitHub
Identification of the Adversary from a Single Adversarial Example (ICML 2023)
☆10Jul 15, 2024Updated 2 years ago
SCIR-SC-Qiaoban-Team / FreeEvalLM
View on GitHub
[AAAI26] Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilitie…
☆11Feb 7, 2026Updated 5 months ago
aengusl / latent-adversarial-training
View on GitHub
☆48Sep 29, 2024Updated last year
rycolab / artificial-languages
View on GitHub
☆12Apr 19, 2022Updated 4 years ago
adamkarvonen / dictionary_learning_demo
View on GitHub
☆26Aug 23, 2025Updated 11 months ago