jacobdunefsky/llm-steering-opt

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jacobdunefsky/llm-steering-opt)

jacobdunefsky / llm-steering-opt

Tools for optimizing steering vectors in LLMs.

☆22

Alternatives and similar repositories for llm-steering-opt

Users that are interested in llm-steering-opt are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TransluceAI / introspective-interp
View on GitHub
Repository for "Training Language Models To Explain Their Own Computations"
☆23Jul 7, 2026Updated 3 weeks ago
science-of-finetuning / crosscoder_learning
View on GitHub
Modified to support crosscoder training.
☆27Jul 2, 2026Updated 3 weeks ago
curt-tigges / probity
View on GitHub
☆19Apr 10, 2025Updated last year
curt-tigges / crosslayer-coding
View on GitHub
☆18Jul 9, 2025Updated last year
NLie2 / what_features_jailbreak_LLMs
View on GitHub
☆18Mar 30, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
science-of-finetuning / diffing-toolkit
View on GitHub
A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.
☆78Jul 20, 2026Updated last week
kutay25 / ai-safety-alignment-camps
View on GitHub
An repository of 2025-2026 AI Safety and Alignment programs, camps, and workshops.
☆22May 18, 2025Updated last year
UKGovernmentBEIS / vllm-lens
View on GitHub
Extract residual-stream activations and apply steering vectors (including activation oracles) to any vLLM model during inference.
☆117Updated this week
matchten / LoRA-Models-for-SAEs
View on GitHub
Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"
☆17Mar 31, 2025Updated last year
koayon / atp_star
View on GitHub
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
☆20Jan 19, 2025Updated last year
ndif-team / nnterp
View on GitHub
Unified access to Large Language Model modules using NNsight
☆116Updated this week
plumdeq / hsvm
View on GitHub
Hyperbolic SVM in Python
☆12Jun 21, 2022Updated 4 years ago
thejaminator / latteries
View on GitHub
James' cookbook of evaluations and finetuning experiments
☆32Feb 19, 2026Updated 5 months ago
neelnanda-io / neel-plotly
View on GitHub
A very hacky set of functions for getting plotly to do what I want when doing mech interp research, designed to be compatible with PyTorc…
☆15Jun 16, 2023Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
tilde-research / sieve
View on GitHub
Applying SAEs for fine-grained control
☆27Dec 15, 2024Updated last year
oclivegriffin / crosscode
View on GitHub
A library for training crosscoders
☆17May 28, 2025Updated last year
fiveai / understanding_safety_finetuning
View on GitHub
Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)
☆12Oct 31, 2024Updated last year
jammastergirish / LLMProbe
View on GitHub
☆20Dec 10, 2025Updated 7 months ago
adamkarvonen / dictionary_learning_demo
View on GitHub
☆26Aug 23, 2025Updated 11 months ago
Butanium / tiny-activation-dashboard
View on GitHub
A tiny easily hackable implementation of a feature dashboard.
☆17Oct 21, 2025Updated 9 months ago
efarrell1 / train_sparse_autoencoder
View on GitHub
Trains Sparse Autoencoders based on outputs from language models
☆11Oct 7, 2024Updated last year
harish-kamath / rqae
View on GitHub
Residual Quantization Autoencoder, used for interpreting LLMs
☆14Jan 1, 2025Updated last year
ApolloResearch / deception-detection
View on GitHub
☆44Feb 11, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
edeyneka / pdf-reader-extension
View on GitHub
☆13Mar 9, 2025Updated last year
ARBORproject / arborproject.github.io
View on GitHub
☆86Feb 25, 2025Updated last year
yash-srivastava19 / arrakis
View on GitHub
Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.
☆31Jul 8, 2026Updated 2 weeks ago
FlyingPumba / InterpBench
View on GitHub
A benchmark for mechanistic discovery of circuits in Transformers
☆17Dec 15, 2024Updated last year
ajyl / mech_int_othelloGPT
View on GitHub
☆10Nov 6, 2024Updated last year
Psi-Prod / ppx_system
View on GitHub
ppx_system is a syntax extension to known operating system at compile time
☆12May 9, 2023Updated 3 years ago
ArthurConmy / MishformerLens
View on GitHub
MishformerLens intends to be a drop-in replacement for TransformerLens that AST patches HuggingFace Transformers rather than implementing…
☆10Oct 7, 2024Updated last year
ajobi-uhc / seer
View on GitHub
This was designed for interp researchers who want to do research on or with interp agents to give quality of life improvements and fix …
☆146Feb 8, 2026Updated 5 months ago
JasonGross / guarantees-based-mechanistic-interpretability
View on GitHub
☆18Jul 21, 2026Updated last week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ArulselvanMadhavan / mini_dalle
View on GitHub
mini-dalle in OCaml
☆39Nov 6, 2022Updated 3 years ago
Butanium / monte-carlo-tree-search-TSP
View on GitHub
Monte Carlo tree search for the travelling salesman problem (MCTS for the TSP)
☆12Jun 18, 2022Updated 4 years ago
jorispos / ConceptorSteering
View on GitHub
☆16Mar 13, 2025Updated last year
adamkarvonen / activation_oracles
View on GitHub
☆96Apr 18, 2026Updated 3 months ago
goodfire-ai / param-decomp
View on GitHub
Parameter Decomposition
☆136Updated this week
ckkissane / crosscoder-model-diff-replication
View on GitHub
Open source replication of Anthropic's Crosscoders for Model Diffing
☆68Oct 27, 2024Updated last year
zhenyi4 / codi
View on GitHub
Official repository for "CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation"
☆102Dec 15, 2025Updated 7 months ago