yuzhaouoe/SAE-based-representation-engineering

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yuzhaouoe/SAE-based-representation-engineering)

yuzhaouoe / SAE-based-representation-engineering

[NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering

☆83

Alternatives and similar repositories for SAE-based-representation-engineering

Users that are interested in SAE-based-representation-engineering are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

DanielSc4 / Dynamic-Activation-Composition
View on GitHub
Materials for "Multi-property Steering of Large Language Models with Dynamic Activation Composition"
☆14Nov 22, 2024Updated last year
aryopg / mmlu-redux
View on GitHub
☆32Nov 9, 2024Updated last year
explanare / ravel
View on GitHub
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
☆58Oct 30, 2025Updated 8 months ago
IBM / sae-steering
View on GitHub
Code to enable layer-level steering in LLMs using sparse auto encoders
☆34Sep 18, 2025Updated 10 months ago
alessiodevoto / l2compress
View on GitHub
Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."
☆19Dec 13, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
yuzhaouoe / pretraining-data-packing
View on GitHub
[ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training
☆24Aug 18, 2024Updated last year
deeplearning-wisc / LUMINA
View on GitHub
Official implementation of ICLR 2026 paper "LUMINA: Detecting Hallucinations in RAG System with Context–Knowledge Signals"
☆18Jan 31, 2026Updated 5 months ago
chrisliu298 / awesome-representation-engineering
View on GitHub
A resource repository for representation engineering in large language models
☆156Nov 14, 2024Updated last year
aryopg / decore
View on GitHub
Official Implementation of "DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucination"
☆30Dec 18, 2024Updated last year
EleutherAI / sparsify
View on GitHub
Sparsify transformers with SAEs and transcoders
☆734Updated this week
slavachalnev / SAE-TS
View on GitHub
Improving Steering Vectors by Targeting Sparse Autoencoder Features
☆29Nov 20, 2024Updated last year
nrimsky / CAA
View on GitHub
Steering Llama 2 with Contrastive Activation Addition
☆241May 23, 2024Updated 2 years ago
cooperleong00 / Awesome-LLM-Interpretability
View on GitHub
A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..
☆308Jan 22, 2026Updated 6 months ago
javiferran / sae_entities
View on GitHub
☆78Mar 6, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
openai / sparse_autoencoder
View on GitHub
☆597Jul 19, 2024Updated 2 years ago
lasgroup / user_interactions
View on GitHub
Aligning Language Models from User Interactions via Self-Distillation
☆26Mar 31, 2026Updated 3 months ago
microsoft / llm-steer-instruct
View on GitHub
A method for steering llms to better follow instructions
☆96Jun 10, 2026Updated last month
ruizheliUOA / ARC_JSD
View on GitHub
A Jensen-Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation
☆15Aug 28, 2025Updated 10 months ago
BunsenFeng / AbstainQA
View on GitHub
AbstainQA, ACL 2024
☆29Feb 4, 2026Updated 5 months ago
francescortu / comp-mech
View on GitHub
Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals; ACL 2024
☆13May 24, 2024Updated 2 years ago
EleutherAI / delphi
View on GitHub
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆268Updated this week
byronBBL / Context-DPO
View on GitHub
Official repository of paper "Context-DPO: Aligning Language Models for Context-Faithfulness"
☆23Feb 17, 2025Updated last year
HKUNLP / STRING
View on GitHub
[ICLR'25] Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"
☆82Nov 25, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
kaistAI / Knowledge-Entropy
View on GitHub
[ICLR 2025 Oral] Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition
☆17Nov 25, 2024Updated last year
DLR-SC / style-vectors-for-steering-llms
View on GitHub
Code release for the paper "Style Vectors for Steering Generative Large Language Models", accepted to the Findings of the EACL 2024.
☆37Sep 26, 2024Updated last year
pillowsofwind / Knowledge-Conflicts-Survey
View on GitHub
[EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"
☆159Sep 21, 2024Updated last year
maitrix-org / dynamic-alignment-optimization
View on GitHub
[EMNLP'24 (Main)] DRPO(Dynamic Rewarding with Prompt Optimization) is a tuning-free approach for self-alignment. DRPO leverages a search-…
☆24Nov 17, 2024Updated last year
jbloomAus / SAEDashboard
View on GitHub
☆109May 23, 2026Updated 2 months ago
IBM / activation-steering
View on GitHub
[ICLR 2025] General-purpose activation steering library
☆181Sep 18, 2025Updated 10 months ago
technion-cs-nlp / LLMsKnow
View on GitHub
☆95Jan 22, 2025Updated last year
ictnlp / TruthX
View on GitHub
Code for ACL 2024 paper "TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space"
☆144Mar 26, 2024Updated 2 years ago
kmeng01 / rome
View on GitHub
Locating and editing factual associations in GPT (NeurIPS 2022)
☆770Apr 20, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
decoderesearch / SAELens
View on GitHub
Training Sparse Autoencoders on Language Models
☆1,484Updated this week
AngelaZZZ-611 / reasoning_models_probing
View on GitHub
☆21May 14, 2026Updated 2 months ago
belindal / ERASE
View on GitHub
Code and Data for "Language Modeling with Editable External Knowledge"
☆37Jun 19, 2024Updated 2 years ago
ckkissane / crosscoder-model-diff-replication
View on GitHub
Open source replication of Anthropic's Crosscoders for Model Diffing
☆68Oct 27, 2024Updated last year
google-research-datasets / QuoteSum
View on GitHub
QuoteSum is a textual QA dataset containing Semi-Extractive Multi-source Question Answering (SEMQA) examples written by humans, based on …
☆13Mar 25, 2024Updated 2 years ago
Lingkai-Kong / RE-Control
View on GitHub
Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective
☆35Jan 31, 2025Updated last year
nightdessert / Retrieval_Head
View on GitHub
open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality
☆241Aug 2, 2024Updated last year