kxcloud / gradient-routingLinks

☆9

Alternatives and similar repositories for gradient-routing

Users that are interested in gradient-routing are comparing it to the libraries listed below

Sorting:

noanabeshima / tinymodel
A TinyStories LM with SAEs and transcoders
☆11Updated 2 months ago
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆52Updated last month
slavachalnev / SAE-TS
Improving Steering Vectors by Targeting Sparse Autoencoder Features
☆21Updated 7 months ago
ArthurConmy / MishformerLens
MishformerLens intends to be a drop-in replacement for TransformerLens that AST patches HuggingFace Transformers rather than implementing…
☆10Updated 8 months ago
UFO-101 / auto-circuit
A library for efficient patching and automatic circuit discovery.
☆67Updated 2 months ago
Butanium / nnterp
A small package implementing some useful wrapping around nnsight
☆13Updated this week
saprmarks / geometry-of-truth
☆85Updated 10 months ago
Jiaxin-Wen / MisleadLM
Official Code for our paper: "Language Models Learn to Mislead Humans via RLHF""
☆14Updated 8 months ago
EleutherAI / steering-llama3
☆29Updated 10 months ago
IBM / sae-steering
Code to enable layer-level steering in LLMs using sparse auto encoders
☆19Updated last month
annahdo / implementing_activation_steering
A collection of different ways to implement accessing and modifying internal model activations for LLMs
☆18Updated 8 months ago
hannamw / EAP-IG
☆37Updated last month
bilal-chughtai / rep-theory-mech-interp
☆26Updated 2 years ago
jbloomAus / SAEDashboard
☆57Updated last week
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆108Updated 4 months ago
mishajw / repeng
Experiments with representation engineering
☆11Updated last year
KihoPark / LLM_Categorical_Hierarchical_Representations
☆99Updated 4 months ago
explanare / ravel
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
☆47Updated 8 months ago
neelnanda-io / Crosscoders
☆44Updated 7 months ago
bartbussmann / matryoshka_sae
☆34Updated 5 months ago
koayon / atp_star
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
☆18Updated 5 months ago
bartbussmann / BatchTopK
Implementation of the BatchTopK activation function for training sparse autoencoders (SAEs)
☆42Updated last month
LoryPack / LLM-LieDetector
Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"
☆70Updated last year
ArthurConmy / Automatic-Circuit-Discovery
☆227Updated 8 months ago
Aaquib111 / edge-attribution-patching
Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"
☆35Updated last year
KihoPark / linear_rep_geometry
☆95Updated 4 months ago
ckkissane / crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
☆55Updated 8 months ago
neelnanda-io / 1L-Sparse-Autoencoder
☆121Updated last year
JoshEngels / SAE-Dark-Matter
Code for our paper "Decomposing The Dark Matter of Sparse Autoencoders"
☆22Updated 4 months ago
HugoFry / mats_sae_training_for_ViTs
☆18Updated last year