eric-mitchell / direct-preference-optimizationLinks

Reference implementation for DPO (Direct Preference Optimization)

☆2,752

Alternatives and similar repositories for direct-preference-optimization

Users that are interested in direct-preference-optimization are comparing it to the libraries listed below

Sorting:

ContextualAI / HALOs
A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).
☆889Updated 3 weeks ago
princeton-nlp / SimPO
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
☆923Updated 8 months ago
allenai / RL4LMs
A modular RL library to fine-tune language models to human preferences
☆2,362Updated last year
GAIR-NLP / O1-Journey
O1 Replication Journey
☆2,002Updated 9 months ago
tatsu-lab / alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
☆1,877Updated 2 months ago
XueFuzhao / OpenMoE
A family of open-sourced Mixture-of-Experts (MoE) Large Language Models
☆1,612Updated last year
PKU-Alignment / safe-rlhf
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
☆1,539Updated last month
openai / prm800k
800,000 step-level correctness labels on LLM solutions to MATH problems
☆2,056Updated 2 years ago
allenai / open-instruct
AllenAI's post-training codebase
☆3,252Updated this week
OpenLMLab / MOSS-RLHF
Secrets of RLHF in Large Language Models Part I: PPO
☆1,399Updated last year
OpenRLHF / OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Asy…
☆8,180Updated 2 weeks ago
AGI-Edgerunners / LLM-Adapters
Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"
☆1,201Updated last year
RLHFlow / RLHF-Reward-Modeling
Recipes to train reward model for RLHF.
☆1,463Updated 5 months ago
openreasoner / openr
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
☆1,823Updated 9 months ago
openai / lm-human-preferences
Code for the paper Fine-Tuning Language Models from Human Preferences
☆1,370Updated 2 years ago
zjunlp / EasyEdit
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
☆2,591Updated last week
hkust-nlp / simpleRL-reason
Simple RL training for reasoning
☆3,773Updated 2 months ago
opendilab / awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
☆4,170Updated last month
anthropics / hh-rlhf
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
☆1,788Updated 4 months ago
Open-Reasoner-Zero / Open-Reasoner-Zero
Official Repo for Open-Reasoner-Zero
☆2,054Updated 4 months ago
srush / awesome-o1
A bibliography and survey of the papers surrounding o1
☆1,209Updated 11 months ago
philschmid / deep-learning-pytorch-huggingface
☆1,301Updated 7 months ago
Tebmer / Awesome-Knowledge-Distillation-of-LLMs
This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicit…
☆1,184Updated 7 months ago
FranxYao / chain-of-thought-hub
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
☆2,748Updated last year
pjlab-sys4nlp / llama-moe
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
☆995Updated 10 months ago
atfortes / Awesome-LLM-Reasoning
From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓
☆3,384Updated 5 months ago
lucidrains / self-rewarding-lm-pytorch
Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI
☆1,399Updated last year
SinclairCoder / Instruction-Tuning-Papers
Reading list of Instruction-tuning. A trend starts from Natrural-Instruction (ACL 2022), FLAN (ICLR 2022) and T0 (ICLR 2022).
☆770Updated 2 years ago
dqxiu / ICL_PaperList
Paper List for In-context Learning 🌷
☆867Updated last year
uclaml / SPIN
The official implementation of Self-Play Fine-Tuning (SPIN)
☆1,206Updated last year