HenryCai11 / LLM-Self-Control
The official repo of paper "Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller"
☆18Updated 9 months ago
Alternatives and similar repositories for LLM-Self-Control
Users that are interested in LLM-Self-Control are comparing it to the libraries listed below
Sorting:
- awesome SAE papers☆27Updated 2 months ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆57Updated last year
- code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models☆32Updated 6 months ago
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆57Updated 5 months ago
- [ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style☆40Updated last month
- A curated list of resources for activation engineering☆74Updated last week
- ☆24Updated last month
- Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective☆29Updated 3 months ago
- A resource repository for representation engineering in large language models☆120Updated 6 months ago