peterljq / Parsimonious-Concept-Engineering
Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)
☆31Updated 2 months ago
Alternatives and similar repositories for Parsimonious-Concept-Engineering:
Users that are interested in Parsimonious-Concept-Engineering are comparing it to the libraries listed below
- ☆12Updated 10 months ago
- ☆43Updated 5 months ago
- ☆40Updated last year
- ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"☆32Updated this week
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆50Updated 10 months ago
- ☆36Updated last year
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆95Updated 10 months ago
- ☆23Updated 2 months ago
- Lightweight Adapting for Black-Box Large Language Models☆19Updated 11 months ago
- ☆54Updated 3 weeks ago
- Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]☆29Updated last week
- [ICLR 2024] Unveiling the Pitfalls of Knowledge Editing for Large Language Models☆22Updated 7 months ago
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆64Updated 7 months ago
- ☆30Updated last month
- [NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors☆72Updated last month
- EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue☆35Updated 2 months ago
- [NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".☆45Updated last month
- ☆52Updated last year
- [SafeGenAi @ NeurIPS 2024] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates☆68Updated 3 months ago
- ☆75Updated 5 months ago
- Function Vectors in Large Language Models (ICLR 2024)☆135Updated 3 months ago
- [NAACL'25] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆41Updated 2 months ago
- Codebase for Instruction Following without Instruction Tuning☆33Updated 4 months ago
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆103Updated 10 months ago
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆53Updated 4 months ago
- ☆26Updated 3 months ago
- ☆46Updated last year
- The Paper List on Data Contamination for Large Language Models Evaluation.☆88Updated 2 weeks ago
- The repository contains code for Adaptive Data Optimization☆20Updated last month