montemac/activation_additions

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/montemac/activation_additions)

montemac / activation_additions

Algebraic value editing in pretrained language models

☆71

Alternatives and similar repositories for activation_additions

Users that are interested in activation_additions are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

UlisseMini / ana
View on GitHub
The AI that helps you achieve your goals
☆11Feb 4, 2024Updated 2 years ago
slavachalnev / SAE-TS
View on GitHub
Improving Steering Vectors by Targeting Sparse Autoencoder Features
☆29Nov 20, 2024Updated last year
saprmarks / geometry-of-truth
View on GitHub
☆114Aug 8, 2024Updated last year
ckkissane / sae-transfer
View on GitHub
Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"
☆13Jul 18, 2024Updated 2 years ago
callummcdougall / sae_vis
View on GitHub
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆268Feb 27, 2026Updated 5 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
callummcdougall / sae-exercises-mats
View on GitHub
☆26Dec 20, 2023Updated 2 years ago
likenneth / honest_llama
View on GitHub
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
☆581Jan 28, 2025Updated last year
steering-vectors / steering-vectors
View on GitHub
Steering vectors for transformer language models in Pytorch / Huggingface
☆159Feb 21, 2025Updated last year
EleutherAI / steering-llama3
View on GitHub
☆30Aug 2, 2024Updated last year
neelnanda-io / Neuroscope
View on GitHub
Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons
☆15Feb 13, 2023Updated 3 years ago
ajyl / dpo_toxic
View on GitHub
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.
☆90Mar 7, 2025Updated last year
UlisseMini / procgen-tools
View on GitHub
Tools for running experiments on RL agents in procgen environments
☆20Apr 5, 2024Updated 2 years ago
nrimsky / CAA
View on GitHub
Steering Llama 2 with Contrastive Activation Addition
☆241May 23, 2024Updated 2 years ago
ArthurConmy / Automatic-Circuit-Discovery
View on GitHub
☆293Oct 1, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ApolloResearch / e2e_sae
View on GitHub
Sparse Autoencoder Training Library
☆58May 1, 2025Updated last year
shauli-ravfogel / adv-kernel-removal
View on GitHub
☆12Oct 23, 2022Updated 3 years ago
chrisliu298 / awesome-representation-engineering
View on GitHub
A resource repository for representation engineering in large language models
☆156Nov 14, 2024Updated last year
EleutherAI / concept-erasure
View on GitHub
Erasing concepts from neural representations with provable guarantees
☆258Jan 27, 2025Updated last year
milesaturpin / cot-unfaithfulness
View on GitHub
☆57Oct 23, 2023Updated 2 years ago
redwoodresearch / Easy-Transformer
View on GitHub
☆149Aug 4, 2024Updated last year
peterljq / Parsimonious-Concept-Engineering
View on GitHub
PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)
☆43Jan 18, 2026Updated 6 months ago
andyzoujm / representation-engineering
View on GitHub
Representation Engineering: A Top-Down Approach to AI Transparency
☆1,015Aug 14, 2024Updated last year
TomFrederik / unseal
View on GitHub
Mechanistic Interpretability for Transformer Models
☆53Jun 1, 2022Updated 4 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
nrimsky / LM-exp
View on GitHub
LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces
☆105Sep 21, 2023Updated 2 years ago
cipher982 / llm-benchmarks
View on GitHub
Benchmarking LLM Inference Speeds
☆14Jul 22, 2026Updated last week
zhxieml / remiss-jailbreak
View on GitHub
☆33Jun 24, 2024Updated 2 years ago
madiodio / remark-twemoji
View on GitHub
Remark plugin to replace your emoji by using Twemoji.
☆10Jul 21, 2026Updated last week
FlyingPumba / InterpBench
View on GitHub
A benchmark for mechanistic discovery of circuits in Transformers
☆17Dec 15, 2024Updated last year
LiuAmber / RAHF
View on GitHub
[ACL 2024 main] Aligning Large Language Models with Human Preferences through Representation Engineering (https://aclanthology.org/2024.…
☆28Sep 25, 2024Updated last year
noanabeshima / matryoshka-saes
View on GitHub
☆33Nov 28, 2024Updated last year
evandez / REMEDI
View on GitHub
Inspecting and Editing Knowledge Representations in Language Models
☆120Jul 24, 2023Updated 3 years ago
tilde-research / sieve
View on GitHub
Applying SAEs for fine-grained control
☆27Dec 15, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
HoagyC / sparse_coding
View on GitHub
Using sparse coding to find distributed representations used by neural networks.
☆307Nov 10, 2023Updated 2 years ago
annahdo / implementing_activation_steering
View on GitHub
A collection of different ways to implement accessing and modifying internal model activations for LLMs
☆24Oct 18, 2024Updated last year
shengliu66 / LC
View on GitHub
Official Implementation of Avoiding spurious correlations via logit correction
☆17May 6, 2023Updated 3 years ago
danielway / nexrad-volumetric-renderer
View on GitHub
Project exploring 3D volumetric rendering of NEXRAD radar data.
☆13Oct 23, 2023Updated 2 years ago
davidbau / baukit
View on GitHub
☆257Feb 22, 2024Updated 2 years ago
UlisseMini / oth
View on GitHub
Obsidian To HTML, A template for building obsidian style notes to a static site
☆19Nov 3, 2022Updated 3 years ago
hijohnnylin / neuronpedia-scorer
View on GitHub
☆17Feb 14, 2024Updated 2 years ago