amack315/unsupervised-steering-vectors

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/amack315/unsupervised-steering-vectors)

amack315 / unsupervised-steering-vectors

☆38

Alternatives and similar repositories for unsupervised-steering-vectors

Users that are interested in unsupervised-steering-vectors are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

alan-cooney / transformer-from-scratch
View on GitHub
Decoder only transformer, built from scratch with PyTorch
☆33Oct 22, 2023Updated 2 years ago
ckkissane / sae-transfer
View on GitHub
Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"
☆13Jul 18, 2024Updated 2 years ago
ApolloResearch / sample
View on GitHub
Repository with sample code using Apollo's suggested engineering practices
☆15Dec 16, 2024Updated last year
Zhang-Yihao / Adversarial-Representation-Engineering
View on GitHub
Official implementation repository for the paper Towards General Conceptual Model Editing via Adversarial Representation Engineering.
☆20Dec 6, 2024Updated last year
callummcdougall / sae_vis
View on GitHub
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆268Feb 27, 2026Updated 5 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
alan-cooney / transformer-lens-starter-template
View on GitHub
A quick way to get started with Transformer Lens
☆14Dec 13, 2023Updated 2 years ago
Cadenza-Labs / sleeper-agents
View on GitHub
☆15Jul 12, 2024Updated 2 years ago
adamkarvonen / SAEBench
View on GitHub
☆179May 1, 2026Updated 2 months ago
EleutherAI / delphi
View on GitHub
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆269Updated this week
ApolloResearch / e2e_sae
View on GitHub
Sparse Autoencoder Training Library
☆58May 1, 2025Updated last year
curt-tigges / crosslayer-coding
View on GitHub
☆18Jul 9, 2025Updated last year
jbloomAus / SAEDashboard
View on GitHub
☆109May 23, 2026Updated 2 months ago
JoshEngels / SAE-Dark-Matter
View on GitHub
Code for our paper "Decomposing The Dark Matter of Sparse Autoencoders"
☆23Feb 6, 2025Updated last year
Confirm-Solutions / dreamy
View on GitHub
Fluent dreaming for language models
☆13Jul 22, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ai-safety-foundation / sparse_autoencoder
View on GitHub
Sparse Autoencoder for Mechanistic Interpretability
☆303Jul 20, 2024Updated 2 years ago
saprmarks / feature-circuits
View on GitHub
☆223Oct 14, 2025Updated 9 months ago
noanabeshima / matryoshka-saes
View on GitHub
☆33Nov 28, 2024Updated last year
Jiaxin-Wen / MisleadLM
View on GitHub
Official Code for our paper: "Language Models Learn to Mislead Humans via RLHF""
☆20Oct 11, 2024Updated last year
anthropics / sycophancy-to-subterfuge-paper
View on GitHub
☆28Sep 5, 2024Updated last year
lasr-spelling / sae-spelling
View on GitHub
Code for the paper "A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders"
☆16Dec 28, 2025Updated 7 months ago
UlisseMini / ana
View on GitHub
The AI that helps you achieve your goals
☆11Feb 4, 2024Updated 2 years ago
TeunvdWeij / sandbagging
View on GitHub
☆21Nov 15, 2024Updated last year
redwoodresearch / remix_public
View on GitHub
☆20Feb 17, 2023Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
clarifying-EM / model-organisms-for-EM
View on GitHub
Code repo for the model organisms and convergent directions of EM papers.
☆72Sep 22, 2025Updated 10 months ago
alexander-turner / TurnTrout.com
View on GitHub
A blog on AI, personal development, and living a good life.
☆46Updated this week
simple-stories / simple_stories_train
View on GitHub
Trains small LMs. Designed for training on SimpleStories
☆14Sep 15, 2025Updated 10 months ago
EleutherAI / elk-generalization
View on GitHub
Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…
☆33May 23, 2024Updated 2 years ago
noanabeshima / tinymodel
View on GitHub
A TinyStories LM with SAEs and transcoders
☆14Apr 3, 2025Updated last year
MassDynamics / protein-inference
View on GitHub
A python package for protein inference in Mass Spectrometric data analysis.
☆10Jun 6, 2022Updated 4 years ago
ndif-team / nnsight
View on GitHub
The nnsight package enables interpreting and manipulating the internals of deep learned models.
☆1,005Updated this week
nrimsky / CAA
View on GitHub
Steering Llama 2 with Contrastive Activation Addition
☆241May 23, 2024Updated 2 years ago
rgreenblatt / model_organism_public
View on GitHub
☆15Jun 17, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ndif-team / nnterp
View on GitHub
Unified access to Large Language Model modules using NNsight
☆116Updated this week
montemac / activation_additions
View on GitHub
Algebraic value editing in pretrained language models
☆71Nov 1, 2023Updated 2 years ago
JoshEngels / MultiDimensionalFeatures
View on GitHub
Code for reproducing our paper "Not All Language Model Features Are Linear"
☆90Nov 27, 2024Updated last year
chanind / linear-relational
View on GitHub
Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch
☆11Aug 7, 2024Updated last year
taufeeque9 / codebook-features
View on GitHub
Sparse and discrete interpretability tool for neural networks
☆65Feb 12, 2024Updated 2 years ago
JasonGross / guarantees-based-mechanistic-interpretability
View on GitHub
☆18Jul 21, 2026Updated last week
tilde-research / sieve
View on GitHub
Applying SAEs for fine-grained control
☆27Dec 15, 2024Updated last year