neverix / saexLinks

SAEs in Jax

☆11

Alternatives and similar repositories for saex

Users that are interested in saex are comparing it to the libraries listed below

Sorting:

ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆53Updated 2 months ago
ckkissane / crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
☆57Updated 8 months ago
UFO-101 / auto-circuit
A library for efficient patching and automatic circuit discovery.
☆70Updated 2 months ago
ApolloResearch / apd
Attribution-based Parameter Decomposition
☆27Updated last month
KihoPark / linear_rep_geometry
☆100Updated 5 months ago
berlino / seq_icl
☆53Updated last year
amack315 / unsupervised-steering-vectors
☆32Updated last year
EleutherAI / elk-generalization
Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…
☆28Updated last year
koayon / atp_star
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
☆18Updated 6 months ago
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆193Updated this week
JoshEngels / MultiDimensionalFeatures
Code for reproducing our paper "Not All Language Model Features Are Linear"
☆77Updated 7 months ago
EleutherAI / steering-llama3
☆29Updated 11 months ago
google-deepmind / mishax
☆134Updated 3 months ago
LoryPack / LLM-LieDetector
Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"
☆71Updated last year
noanabeshima / tinymodel
A TinyStories LM with SAEs and transcoders
☆12Updated 3 months ago
callummcdougall / sae_visualizer
☆28Updated last year
jbloomAus / SAEDashboard
☆60Updated this week
wesg52 / universal-neurons
Universal Neurons in GPT2 Language Models
☆30Updated last year
tilde-research / activault
Engine for collecting, uploading, and downloading model activations
☆20Updated 3 months ago
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆115Updated 4 months ago
callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆206Updated 7 months ago
adamkarvonen / SAEBench
☆107Updated this week
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆147Updated 3 weeks ago
ejnnr / cupbearer
A library for mechanistic anomaly detection
☆22Updated 6 months ago
KoyenaPal / future-lens
Code and Data Repo for the CoNLL Paper -- Future Lens: Anticipating Subsequent Tokens from a Single Hidden State
☆18Updated last year
Nix07 / finetuning
This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…
☆27Updated last year
stanfordnlp / axbench
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
☆102Updated 3 weeks ago
neelnanda-io / 1L-Sparse-Autoencoder
☆123Updated last year
ArthurConmy / Automatic-Circuit-Discovery
☆231Updated 9 months ago
neelnanda-io / Crosscoders
☆47Updated 8 months ago