junkangwu / Dr_DPOLinks

[ICLR 2025] Official code of "Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization"

☆18

Alternatives and similar repositories for Dr_DPO

Users that are interested in Dr_DPO are comparing it to the libraries listed below

Sorting:

ZHZisZZ / modpo
[ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
☆89Updated last year
holarissun / Prompt-OIRL
code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning
☆42Updated last year
holarissun / RewardModelingBeyondBradleyTerry
official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…
☆66Updated 6 months ago
YangRui2015 / RiC
Code for the ICML 2024 paper "Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment"
☆76Updated 4 months ago
srzer / MOD
Official code for "Decoding-Time Language Model Alignment with Multiple Objectives".
☆26Updated 11 months ago
junkangwu / alpha-DPO
[ICML 2025] Official code of "AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization"
☆22Updated last year
liziniu / policy_optimization
Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)
☆28Updated last year
jwliao-ai / MARFT
☆68Updated 3 weeks ago
alexrame / rewardedsoups
Rewarded soups official implementation
☆60Updated 2 years ago
Zayne-sprague / To-CoT-or-not-to-CoT
☆25Updated 6 months ago
keven980716 / weak-to-strong-deception
[ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"
☆13Updated last year
sanowl / Self-Correcting-LLM--Reinforcement-Learning-
This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by g…
☆37Updated 3 months ago
weizhepei / WebAgent-R1
[EMNLP 2025] WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
☆47Updated last month
princeton-pli / what-makes-good-rm
[NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective
☆38Updated last month
junkangwu / beta-DPO
[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
☆49Updated 11 months ago
XueruiSu / Trust-Region-Preference-Approximation
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
☆13Updated 3 months ago
ChnQ / TracingLLM
☆30Updated last year
haotiansun14 / BBox-Adapter
Lightweight Adapting for Black-Box Large Language Models
☆23Updated last year
ECNU-ICALK / BPT-VLM
[IJCAI 2023] Black-box Prompt Tuning for Vision-Language Model as a Service
☆18Updated 2 years ago
uservan / ThinkPO
☆18Updated 2 months ago
tsinghua-fib-lab / SmartAgent
The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".
☆27Updated 2 months ago
haozheji / exact-optimization
ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment
☆57Updated last year
cathyxl / MAgIC
☆41Updated 11 months ago
Lingkai-Kong / RE-Control
Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective
☆34Updated 8 months ago
mathllm / Step-Controlled_DPO
☆22Updated last year
RLHFlow / Directional-Preference-Alignment
Directional Preference Alignment
☆57Updated last year
chendl02 / Awesome-LLM-Causal-Reasoning
[NAACL 25 main] Awesome LLM Causal Reasoning is a collection of LLM-based casual reasoning works, including papers, codes and datasets.
☆85Updated 3 weeks ago
ziyuwan / ReMA-public
Reinforced Multi-LLM Agents training
☆51Updated 4 months ago
RUCAIBox / RLMEC
The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"
☆38Updated last year
OpenCausaLab / CELLO
☆21Updated 11 months ago