ZFancy/awesome-activation-engineering

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ZFancy/awesome-activation-engineering)

ZFancy / awesome-activation-engineering

A curated list of resources for activation engineering

☆140

Alternatives and similar repositories for awesome-activation-engineering

Users that are interested in awesome-activation-engineering are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tmlr-group / G-effect
View on GitHub
[ICLR 2025] "Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond"
☆16Feb 27, 2025Updated last year
cma1114 / activation_steering
View on GitHub
An exploration of LLM steering
☆28Jun 15, 2024Updated 2 years ago
tmlr-group / NoisyRationales
View on GitHub
[NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"
☆40Jul 18, 2025Updated last year
Aboriginer / EOE
View on GitHub
[ICML 2024] "Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection"
☆15Feb 15, 2025Updated last year
tmlr-group / CoPA
View on GitHub
[NeurIPS 2024] "Mind the Gap between Prototypes and Images in Cross-domain Finetuning"
☆11Nov 15, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
tmlr-group / Co-rewarding
View on GitHub
[ICLR 2026] "Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models"
☆58Feb 4, 2026Updated 5 months ago
chrisliu298 / awesome-representation-engineering
View on GitHub
A resource repository for representation engineering in large language models
☆156Nov 14, 2024Updated last year
nrimsky / CAA
View on GitHub
Steering Llama 2 with Contrastive Activation Addition
☆241May 23, 2024Updated 2 years ago
IBM / activation-steering
View on GitHub
[ICLR 2025] General-purpose activation steering library
☆181Sep 18, 2025Updated 10 months ago
ZFancy / DivOE
View on GitHub
[NeurIPS 2023] "Diversified Outlier Exposure for Out-of-Distribution Detection via Informative Extrapolation"
☆11Oct 6, 2023Updated 2 years ago
stanfordnlp / axbench
View on GitHub
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
☆210Mar 12, 2026Updated 4 months ago
tmlr-group / AlphaApollo
View on GitHub
[arXiv:2510.06261] "AlphaApollo: A System for Deep Agentic Reasoning"
☆46May 18, 2026Updated 2 months ago
IBM / sae-steering
View on GitHub
Code to enable layer-level steering in LLMs using sparse auto encoders
☆34Sep 18, 2025Updated 10 months ago
DanielSc4 / Dynamic-Activation-Composition
View on GitHub
Materials for "Multi-property Steering of Large Language Models with Dynamic Activation Composition"
☆14Nov 22, 2024Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
CaoYuanpu / BiPO
View on GitHub
Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
☆50Jul 28, 2024Updated last year
tmlr-group / EOE
View on GitHub
[ICML 2024] "Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection"
☆13Feb 15, 2025Updated last year
slavachalnev / SAE-TS
View on GitHub
Improving Steering Vectors by Targeting Sparse Autoencoder Features
☆29Nov 20, 2024Updated last year
zepingyu0512 / awesome-llm-understanding-mechanism
View on GitHub
awesome papers in LLM interpretability
☆624Aug 20, 2025Updated 11 months ago
weixuan-wang123 / SADI
View on GitHub
☆19Sep 1, 2025Updated 10 months ago
itsqyh / Awesome-LMMs-Mechanistic-Interpretability
View on GitHub
A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository agg…
☆215Mar 4, 2026Updated 4 months ago
tmlr-group / AR-Bench
View on GitHub
[ICML 2025] "From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?"
☆47Oct 8, 2025Updated 9 months ago
tmlr-group / landscape-of-thoughts
View on GitHub
[ICLR 2026] "Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models"
☆61May 21, 2026Updated 2 months ago
lyh6560new / P3Sum
View on GitHub
The offical code for paper "What Constitutes a Faithful Summary? Preserving Author Perspectives in News Summarization"
☆10Jun 23, 2024Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
cooperleong00 / Awesome-LLM-Interpretability
View on GitHub
A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..
☆308Jan 22, 2026Updated 6 months ago
ZBox1005 / AgentForesight
View on GitHub
AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems
☆16May 12, 2026Updated 2 months ago
steering-vectors / steering-vectors
View on GitHub
Steering vectors for transformer language models in Pytorch / Huggingface
☆157Feb 21, 2025Updated last year
zzp1012 / Cross-Task-Linearity
View on GitHub
[ICML 2024] Code release for "On the Emergence of Cross-Task Linearity in Pretraining-Finetuning Paradigm"
☆11Feb 20, 2025Updated last year
shenlei515 / VHL-paddle
View on GitHub
translation of VHL repo in paddle
☆25Jun 28, 2023Updated 3 years ago
tmlr-group / ZS-NTTA
View on GitHub
[ICLR 2025] "Noisy Test-Time Adaptation in Vision-Language Models"
☆13Feb 22, 2025Updated last year
tmlr-group / ECON
View on GitHub
[ICML 2025] "From Debate to Equilibrium: Belief-Driven Multi-Agent LLM Reasoning via Bayesian Nash Equilibrium"
☆39Nov 23, 2025Updated 8 months ago
d-ailin / CLIP-Guided-Decoding
View on GitHub
☆18Aug 1, 2024Updated last year
Aboriginer / ZS-NTTA
View on GitHub
[ICLR 2025] "Noisy Test-Time Adaptation in Vision-Language Models"
☆16Feb 22, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
RoyalSkye / ATCL
View on GitHub
[NeurIPS 2022] "Adversarial Training with Complementary Labels: On the Benefit of Gradually Informative Attacks"
☆13Nov 11, 2022Updated 3 years ago
leopoldwhite / Awesome-Inference-Time-Trustworthiness
View on GitHub
☆15May 15, 2026Updated 2 months ago
jayneelparekh / learn-to-steer
View on GitHub
[NeurIPS 2025] Official Implementation for Learning to Steer: Input-dependent Steering for Multimodal LLMs
☆19Dec 14, 2025Updated 7 months ago
hkust-nlp / Activation_Decoding
View on GitHub
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)
☆64Mar 30, 2024Updated 2 years ago
noanabeshima / matryoshka-saes
View on GitHub
☆33Nov 28, 2024Updated last year
ZJU-REAL / EasySteer
View on GitHub
A Unified Framework for High-Performance and Extensible LLM Steering
☆288Apr 30, 2026Updated 2 months ago
montemac / activation_additions
View on GitHub
Algebraic value editing in pretrained language models
☆71Nov 1, 2023Updated 2 years ago