junkangwu / Dr_DPO
[ICLR 2025] Official code of "Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization"
☆13Updated 11 months ago
Alternatives and similar repositories for Dr_DPO
Users that are interested in Dr_DPO are comparing it to the libraries listed below
Sorting:
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆76Updated 8 months ago
- [ICML 2025] Official code of "AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization"☆19Updated 7 months ago
- ☆10Updated 3 weeks ago
- Code for the ICML 2024 paper "Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment"☆69Updated 4 months ago
- Rewarded soups official implementation☆57Updated last year
- Direct preference optimization with f-divergences.☆13Updated 6 months ago
- Official code for "Decoding-Time Language Model Alignment with Multiple Objectives".☆22Updated 6 months ago
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆43Updated 6 months ago
- [NeurIPS2023] Official code of "Understanding Contrastive Learning via Distributionally Robust Optimization"☆40Updated last year
- Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples☆33Updated last month
- ☆29Updated last week
- [NAACL 25 main] Awesome LLM Causal Reasoning is a collection of LLM-based casual reasoning works, including papers, codes and datasets.☆59Updated 2 months ago
- Official implementation for "ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation"☆16Updated last week
- Repo of "Large Language Model-based Human-Agent Collaboration for Complex Task Solving(EMNLP2024 Findings)"☆32Updated 7 months ago
- ☆25Updated 11 months ago
- ☆18Updated last year
- Code for NeurIPS 2024 paper "Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs"☆34Updated 2 months ago
- code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning☆40Updated last year
- The code of paper "Toward Optimal LLM Alignments Using Two-Player Games".☆16Updated 10 months ago
- Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective☆29Updated 3 months ago
- ☆40Updated last year
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆58Updated last month
- Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)☆28Updated last year
- Code and data for the paper "Understanding Hidden Context in Preference Learning: Consequences for RLHF"☆29Updated last year
- [NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"☆35Updated 4 months ago
- ☆30Updated 6 months ago
- SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities☆14Updated last month
- [ACL'24, Outstanding Paper] Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!☆36Updated 9 months ago
- Models, data, and codes for the paper: MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models☆18Updated 7 months ago
- This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by g…☆35Updated last month