Cornell-RL / drpoLinks

Dateset Reset Policy Optimization

☆31

Alternatives and similar repositories for drpo

Users that are interested in drpo are comparing it to the libraries listed below

Sorting:

Asap7772 / understanding-rlhf
Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…
☆32Updated last year
WentseChen / Verlog
Verlog: A Multi-turn RL framework for LLM agents
☆64Updated 2 weeks ago
abaheti95 / LoL-RL
Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients
☆26Updated last year
shenao-zhang / BARL
Bayes-Adaptive RL for LLM Reasoning
☆41Updated 5 months ago
waterhorse1 / Natural-language-RL
Natural Language Reinforcement Learning
☆100Updated 3 months ago
facebookresearch / rlfh-gen-div
This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity
☆47Updated last year
sotopia-lab / sotopia-rl
Sotopia-RL: Reward Design for Social Intelligence
☆43Updated 3 months ago
allenai / sso
Repository for Skill Set Optimization
☆14Updated last year
RLHFlow / Directional-Preference-Alignment
Directional Preference Alignment
☆57Updated last year
Fu-Dayuan / PreAct
PreAct: Prediction Enhances Agent's Planning Ability (Coling2025)
☆30Updated 11 months ago
janphilippfranken / sami
Self-Supervised Alignment with Mutual Information
☆21Updated last year
microsoft / RLHF-APA
RL algorithm: Advantage induced policy alignment
☆65Updated 2 years ago
mansicer / Q-Adapter
Implementation of ICLR 2025 paper "Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation"
☆18Updated last year
vwxyzjn / summarize_from_feedback_details
☆155Updated 11 months ago
cognitiveailab / GPT-simulator
☆30Updated last year
sail-sg / dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
☆44Updated 7 months ago
snu-mllab / DPPO
Official implementation of "Direct Preference-based Policy Optimization without Reward Modeling" (NeurIPS 2023)
☆42Updated last year
liziniu / policy_optimization
Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)
☆28Updated last year
zhourunlong / Reflect-RL
Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
☆15Updated 4 months ago
mnoukhov / async_rlhf
Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models
☆67Updated 6 months ago
amazon-science / PAE
☆64Updated 8 months ago
sauc-abadal / ALT
Official repository for ALT (ALignment with Textual feedback).
☆10Updated last year
jhejna / cpl
Code for Contrastive Preference Learning (CPL)
☆176Updated last year
haozheji / exact-optimization
ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment
☆57Updated last year
yale-nlp / refdpo
☆16Updated last year
DualityRL / multi-attempt
☆19Updated 8 months ago
abdulhaim / LMRL-Gym
☆105Updated last year
ZhaolinGao / REFUEL
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
☆24Updated last year
swtheing / PF-PPO-RLHF
☆34Updated last year
gregorbachmann / Next-Token-Failures
☆104Updated last year