princeton-nlp / unintentional-unalignment
[ICLR 2025] Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
☆19Updated 3 weeks ago
Alternatives and similar repositories for unintentional-unalignment:
Users that are interested in unintentional-unalignment are comparing it to the libraries listed below
- Official code for "Decoding-Time Language Model Alignment with Multiple Objectives".☆18Updated 3 months ago
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆39Updated 3 months ago
- Offical code of the paper Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Le…☆71Updated 11 months ago
- Directional Preference Alignment☆56Updated 4 months ago
- ☆25Updated last year
- ☆34Updated last year
- ☆12Updated last month
- Lightweight Adapting for Black-Box Large Language Models☆19Updated last year
- ☆49Updated last year
- ☆80Updated 11 months ago
- Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"☆32Updated 9 months ago
- Augmenting Statistical Models with Natural Language Parameters☆23Updated 5 months ago
- ☆72Updated 8 months ago
- ☆37Updated last year
- Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs☆34Updated last year
- Evaluating the Ripple Effects of Knowledge Editing in Language Models☆53Updated 10 months ago
- [NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors☆71Updated last month
- ☆49Updated last year
- ☆30Updated 9 months ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆42Updated 6 months ago
- ☆42Updated last year
- A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643☆74Updated last year
- ☆32Updated last year
- [EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions☆106Updated 5 months ago
- Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)☆23Updated 7 months ago
- Source code for the TMLR paper "Black-Box Prompt Learning for Pre-trained Language Models"☆55Updated last year
- The repository of the project "Fine-tuning Large Language Models with Sequential Instructions", code base comes from open-instruct and LA…☆29Updated 2 months ago
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆34Updated last year
- ☆44Updated 6 months ago
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆66Updated 6 months ago