alecwangcq / f-divergence-dpoLinks

Direct preference optimization with f-divergences.

☆15

Alternatives and similar repositories for f-divergence-dpo

Users that are interested in f-divergence-dpo are comparing it to the libraries listed below

Sorting:

YifeiZhou02 / ArCHer
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
☆198Updated 7 months ago
YangRui2015 / RiC
Code for the ICML 2024 paper "Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment"
☆78Updated 5 months ago
YuxiXie / MCTS-DPO
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
☆327Updated last year
ZHZisZZ / modpo
[ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
☆93Updated last year
OpenDFM / Rememberer
[NeurIPS 2023] Large Language Models Are Semi-Parametric Reinforcement Learning Agents
☆38Updated last year
louieworth / awesome-rlhf
An index of algorithms for reinforcement learning from human feedback (rlhf))
☆92Updated last year
CJReinforce / PURE
Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"
☆142Updated last month
alexrame / rewardedsoups
Rewarded soups official implementation
☆62Updated 2 years ago
liziniu / ReMax
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
☆198Updated last year
tlc4418 / llm_optimization
A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.
☆45Updated 10 months ago
genrm-star / genrm-critiques
GenRM-CoT: Data release for verification rationales
☆66Updated last year
McGill-NLP / VinePPO
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆182Updated 6 months ago
Vance0124 / Token-level-Direct-Preference-Optimization
Reference implementation for Token-level Direct Preference Optimization(TDPO)
☆148Updated 9 months ago
sanowl / Self-Correcting-LLM--Reinforcement-Learning-
This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by g…
☆37Updated 4 months ago
PRIME-RL / Entropy-Mechanism-of-RL
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
☆396Updated 4 months ago
CUHK-ARISE / GAMABench
Benchmarking LLMs' Gaming Ability in Multi-Agent Environments
☆88Updated 7 months ago
Linear95 / APO
Code for ACL2024 paper - Adversarial Preference Optimization (APO).
☆56Updated last year
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆124Updated last year
YangRui2015 / Generalizable-Reward-Model
Code for NeurIPS 2024 paper "Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs"
☆43Updated 9 months ago
WANGXinyiLinda / LM_random_walk
Official code for paper Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation
☆20Updated last year
WeiXiongUST / Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning
This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DP…
☆30Updated last year
Zhou-Zoey / RMB-Reward-Model-Benchmark
☆46Updated 8 months ago
kanishkg / cognitive-behaviors
☆216Updated 8 months ago
WooooDyy / LLM-Reverse-Curriculum-RL
Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…
☆113Updated last year
PRIME-RL / ImplicitPRM
Repo of paper "Free Process Rewards without Process Labels"
☆167Updated 8 months ago
BunsenFeng / model_swarm
☆28Updated last year
glorgao / SelectiveDPO
Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples
☆44Updated 4 months ago
ziyuwan / ReMA-public
Reinforced Multi-LLM Agents training
☆60Updated 5 months ago
Yifan-Song793 / ETO
Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)
☆159Updated last year
waterhorse1 / LLM_Tree_Search
(ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training
☆283Updated last year