HenryCai11 / LLM-Self-Control
The official repo of paper "Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller"
☆18Updated 5 months ago
Alternatives and similar repositories for LLM-Self-Control:
Users that are interested in LLM-Self-Control are comparing it to the libraries listed below
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆50Updated 9 months ago
- A resource repository for representation engineering in large language models☆95Updated 2 months ago
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆102Updated 9 months ago
- code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models☆26Updated 2 months ago
- Collection of Reverse Engineering in Large Model☆31Updated last week
- Official Code for Paper: Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆66Updated 3 months ago
- [NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"☆30Updated this week
- ☆86Updated last year
- Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆31Updated 2 months ago
- awesome SAE papers☆13Updated this week
- LLM Unlearning☆141Updated last year
- Code for Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities (NeurIPS'24)☆14Updated last month
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆61Updated 2 months ago
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆57Updated 3 months ago
- ☆44Updated last year
- ☆153Updated 7 months ago
- [NeurIPS 2024] Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models☆86Updated 5 months ago
- Function Vectors in Large Language Models (ICLR 2024)☆132Updated 3 months ago
- ☆15Updated 3 months ago
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆196Updated 3 months ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆63Updated last week
- ☆104Updated last month
- [ACL 2024] Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models☆42Updated 4 months ago
- ☆18Updated last month
- [EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions☆105Updated 4 months ago
- [NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors☆71Updated 3 weeks ago
- Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models☆86Updated 5 months ago
- ☆24Updated 3 months ago
- ☆21Updated 9 months ago
- Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)☆23Updated 6 months ago