A resource repository for representation engineering in large language models
☆150Nov 14, 2024Updated last year
Alternatives and similar repositories for awesome-representation-engineering
Users that are interested in awesome-representation-engineering are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR 2025] General-purpose activation steering library☆164Sep 18, 2025Updated 6 months ago
- Representation Engineering: A Top-Down Approach to AI Transparency☆978Aug 14, 2024Updated last year
- Steering Llama 2 with Contrastive Activation Addition☆222May 23, 2024Updated last year
- Steering vectors for transformer language models in Pytorch / Huggingface☆148Feb 21, 2025Updated last year
- Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization☆44Jul 28, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Improving Alignment and Robustness with Circuit Breakers☆260Sep 24, 2024Updated last year
- ☆14Feb 24, 2025Updated last year
- ☆23Jun 13, 2024Updated last year
- Algebraic value editing in pretrained language models☆70Nov 1, 2023Updated 2 years ago
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆79Jan 16, 2026Updated 3 months ago
- ☆60Jun 13, 2024Updated last year
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆376Jun 13, 2025Updated 10 months ago
- PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆43Jan 18, 2026Updated 2 months ago
- A curated list of resources for activation engineering☆134Oct 2, 2025Updated 6 months ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…☆13Jan 26, 2025Updated last year
- Code release for the paper "Style Vectors for Steering Generative Large Language Models", accepted to the Findings of the EACL 2024.☆36Sep 26, 2024Updated last year
- Materials for "Multi-property Steering of Large Language Models with Dynamic Activation Composition"☆14Nov 22, 2024Updated last year
- ☆252Feb 22, 2024Updated 2 years ago
- ☆30Aug 2, 2024Updated last year
- Stanford NLP Python library for understanding and improving PyTorch models via interventions☆870Mar 6, 2026Updated last month
- Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"☆108May 20, 2025Updated 10 months ago
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆300Jan 22, 2026Updated 2 months ago
- Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective☆35Jan 31, 2025Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"☆66Jun 9, 2025Updated 10 months ago
- Official codebase for "Analyzing the Generalization and Reliability of Steering Vectors"☆20Dec 14, 2024Updated last year
- The Github repo for our survey paper: "Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large…☆114Updated this week
- ☆119Feb 11, 2025Updated last year
- Experiments with representation engineering☆14Feb 28, 2024Updated 2 years ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆180Mar 12, 2026Updated last month
- Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)☆144Jul 13, 2025Updated 9 months ago
- An exploration of LLM steering☆26Jun 15, 2024Updated last year
- A library for making RepE control vectors☆711Sep 24, 2025Updated 6 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Modified to support crosscoder training.☆26Feb 4, 2026Updated 2 months ago
- Our research proposes a novel MoGU framework that improves LLMs' safety while preserving their usability.☆18Jan 14, 2025Updated last year
- Inference-Time Intervention: Eliciting Truthful Answers from a Language Model☆574Jan 28, 2025Updated last year
- Code to the paper: The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence☆28Jul 31, 2025Updated 8 months ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆104Sep 21, 2023Updated 2 years ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆250Updated this week
- Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons☆13Feb 13, 2023Updated 3 years ago