zhliu0106 / learning-to-refuse
Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"
☆9Updated last month
Alternatives and similar repositories for learning-to-refuse:
Users that are interested in learning-to-refuse are comparing it to the libraries listed below
- Implementation of AdaCQR(COLING 2025)☆10Updated last month
- Official code for the paper: Evaluating Copyright Takedown Methods for Language Models☆17Updated 6 months ago
- BeHonest: Benchmarking Honesty in Large Language Models☆31Updated 5 months ago
- ☆72Updated 8 months ago
- Self-Knowledge Guided Retrieval Augmentation for Large Language Models (EMNLP Findings 2023)☆25Updated last year
- Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"☆62Updated 11 months ago
- ☆34Updated 2 months ago
- [NeurIPS 2024] Can Language Models Learn to Skip Steps?☆12Updated this week
- ☆52Updated 5 months ago
- [EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"☆100Updated 4 months ago
- ☆40Updated last year
- Personality Alignment of Language Models☆20Updated 4 months ago
- [EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions☆106Updated 4 months ago
- Official Code for EMNLP2023 Main Conference paper: "KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detec…☆30Updated last year
- Methods and evaluation for aligning language models temporally☆27Updated 10 months ago
- ☆57Updated last month
- Source code for Truth-Aware Context Selection: Mitigating the Hallucinations of Large Language Models Being Misled by Untruthful Contexts☆17Updated 4 months ago
- ☆12Updated 5 months ago
- [NAACL'25] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆41Updated 2 months ago
- ☆9Updated 4 months ago
- Code and data for "ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM" (NeurIPS 2024 Track Datasets and…☆34Updated 3 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆13Updated last month
- ☆38Updated last year
- Code and data for "Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?" (ACL 2024)☆32Updated 6 months ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆66Updated 2 weeks ago
- The official implementation of "ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization…☆14Updated 11 months ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆58Updated 3 months ago
- 🍼 Official implementation of Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts☆36Updated 4 months ago
- Official repository for ICLR 2024 Spotlight paper "Large Language Models Are Not Robust Multiple Choice Selectors"☆38Updated 7 months ago