MiaoXiong2320 / llm-uncertainty
code repo for ICLR 2024 paper "Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs"
☆100Updated 11 months ago
Alternatives and similar repositories for llm-uncertainty:
Users that are interested in llm-uncertainty are comparing it to the libraries listed below
- ☆83Updated 7 months ago
- [ACL 2024] Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models☆45Updated 5 months ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆50Updated 10 months ago
- ☆154Updated 8 months ago
- Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective☆24Updated 3 weeks ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆62Updated 3 months ago
- [NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors☆71Updated 2 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆115Updated 5 months ago
- source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"☆31Updated last month
- This is the official repo for Towards Uncertainty-Aware Language Agent.☆24Updated 6 months ago
- [EMNLP 2023] Poisoning Retrieval Corpora by Injecting Adversarial Passages https://arxiv.org/abs/2310.19156☆29Updated last year
- ☆30Updated 9 months ago
- [NAACL'25] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆47Updated 2 months ago
- ☆30Updated 4 months ago
- Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆32Updated 3 months ago
- ☆37Updated last year
- Lightweight Adapting for Black-Box Large Language Models☆19Updated last year
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆49Updated 4 months ago
- ☆70Updated last month
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆126Updated this week
- Reproduction of "RLCD Reinforcement Learning from Contrast Distillation for Language Model Alignment☆66Updated last year
- Official Implementation for EMNLP 2024 (main) "AgentReview: Exploring Academic Peer Review with LLM Agent."☆44Updated 3 months ago
- ☆26Updated last year
- LoFiT: Localized Fine-tuning on LLM Representations☆33Updated last month
- ☆47Updated last year
- ☆41Updated last week
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆66Updated 6 months ago
- ☆44Updated 6 months ago
- Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)☆23Updated 7 months ago