noanabeshima / matryoshka-saesLinks

☆23

Alternatives and similar repositories for matryoshka-saes

Users that are interested in matryoshka-saes are comparing it to the libraries listed below

Sorting:

ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆55Updated 5 months ago
adamkarvonen / SAE_BoardGameEval
☆23Updated 8 months ago
KihoPark / linear_rep_geometry
☆106Updated 8 months ago
bartbussmann / matryoshka_sae
☆47Updated 8 months ago
bilal-chughtai / rep-theory-mech-interp
☆27Updated 2 years ago
explanare / ravel
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
☆56Updated last year
UFO-101 / auto-circuit
A library for efficient patching and automatic circuit discovery.
☆77Updated 2 months ago
wesg52 / universal-neurons
Universal Neurons in GPT2 Language Models
☆30Updated last year
jiahai-feng / binding-iclr
☆15Updated last year
EleutherAI / elk-generalization
Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…
☆28Updated last year
taufeeque9 / codebook-features
Sparse and discrete interpretability tool for neural networks
☆63Updated last year
DeqingFu / transformers-icl-second-order
Official repository for our paper, Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Mode…
☆18Updated 10 months ago
ckkissane / crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
☆59Updated 11 months ago
ckkissane / sae-transfer
Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"
☆12Updated last year
koayon / atp_star
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
☆20Updated 8 months ago
ApolloResearch / apd
Attribution-based Parameter Decomposition
☆31Updated 4 months ago
aadityasingh / icl-dynamics
☆22Updated 5 months ago
neelnanda-io / 1L-Sparse-Autoencoder
☆127Updated last year
milesaturpin / cot-unfaithfulness
☆48Updated last year
saprmarks / feature-circuits
☆189Updated 2 months ago
slavachalnev / SAE-TS
Improving Steering Vectors by Targeting Sparse Autoencoder Features
☆24Updated 10 months ago
science-of-finetuning / crosscoder_learning
Modified to support crosscoder training.
☆23Updated 2 months ago
mechanistic-interpretability-grokking / progress-measures-paper
☆69Updated 3 years ago
callummcdougall / sae-exercises-mats
☆23Updated last year
redwoodresearch / Easy-Transformer
☆126Updated last year
MaheepChaudhary / SAE-Ravel
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…
☆12Updated 8 months ago
Nix07 / finetuning
This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…
☆28Updated last year
mlepori1 / NeuroSurgeon
NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformers
☆41Updated 8 months ago
p-lambda / incontext-learning
Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implici…
☆108Updated last year
adamkarvonen / SAEBench
☆127Updated last week