Lingkai-Kong / RE-Control
Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective
☆24Updated last month
Alternatives and similar repositories for RE-Control:
Users that are interested in RE-Control are comparing it to the libraries listed below
- Lightweight Adapting for Black-Box Large Language Models☆20Updated last year
- ☆35Updated last year
- Official code for "Decoding-Time Language Model Alignment with Multiple Objectives".☆19Updated 4 months ago
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆62Updated 5 months ago
- [NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"☆33Updated last month
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆74Updated 7 months ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆64Updated 3 months ago
- ☆50Updated last year
- Code for Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities (NeurIPS'24)☆16Updated 2 months ago
- [ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆71Updated this week
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆12Updated 8 months ago
- Code for the ICML 2024 paper "Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment"☆57Updated 2 months ago
- Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆33Updated 3 months ago
- ☆30Updated 5 months ago
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆70Updated 6 months ago
- ☆19Updated 7 months ago
- Official repository of "Localizing Task Information for Improved Model Merging and Compression" [ICML 2024]☆39Updated 4 months ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆50Updated 11 months ago
- ☆26Updated last year
- [NAACL'25] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆49Updated 3 months ago
- ☆45Updated 6 months ago
- The official repo of paper "Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller"☆18Updated 6 months ago
- A resource repository for representation engineering in large language models☆107Updated 3 months ago
- ☆89Updated last year
- LoFiT: Localized Fine-tuning on LLM Representations☆33Updated last month
- Rewarded soups official implementation☆54Updated last year
- Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".☆95Updated last year
- This is the official repo for Towards Uncertainty-Aware Language Agent.☆24Updated 6 months ago
- ☆12Updated 11 months ago
- Preprint: Asymmetry in Low-Rank Adapters of Foundation Models☆35Updated last year