HenryCai11 / LLM-Self-Control
The official repo of paper "Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller"
☆18Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for LLM-Self-Control
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆45Updated 7 months ago
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆97Updated 7 months ago
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆174Updated last month
- Official Code for Paper: Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆60Updated last month
- The Paper List on Data Contamination for Large Language Models Evaluation.☆76Updated this week
- ☆81Updated last year
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆75Updated last month
- ☆39Updated last month
- Collection of Reverse Engineering in Large Model☆30Updated 2 weeks ago
- FeatureAlignment = Alignment + Mechanistic Interpretability☆15Updated last week
- Data and code for the Corr2Cause paper (ICLR 2024)☆88Updated 7 months ago
- [EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"☆86Updated 2 months ago
- ☆12Updated 8 months ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆57Updated 2 weeks ago
- ☆28Updated 9 months ago
- An Open Source Implementation of Anthropic's Paper: "Towards Monosemanticity: Decomposing Language Models with Dictionary Learning"☆29Updated 6 months ago
- A resource repository for representation engineering in large language models☆54Updated last week
- Lightweight Adapting for Black-Box Large Language Models☆18Updated 9 months ago
- [NeurIPS 2024] Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models☆76Updated 3 months ago
- For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.☆48Updated this week
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆53Updated last month
- Repository of paper "LLMs with Chain-of-Thought Are Non-Causal Reasoners"☆15Updated 7 months ago
- [NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors☆69Updated 9 months ago
- Function Vectors in Large Language Models (ICLR 2024)☆119Updated last month
- code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models☆19Updated this week
- ☆64Updated 2 months ago
- AdaICL: Which Examples to Annotate of In-Context Learning? Towards Effective and Efficient Selection☆16Updated last year
- Code for "Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized Language Models", ICLR 2024 Oral.☆20Updated 7 months ago
- Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆26Updated 2 weeks ago
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆47Updated last month