TianduoWang / DPO-STLinks

[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

☆45

Alternatives and similar repositories for DPO-ST

Users that are interested in DPO-ST are comparing it to the libraries listed below

Sorting:

liziniu / GEM
Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)
☆33Updated 2 months ago
GuanghaoYe / Emergence-of-Thinking
☆52Updated 5 months ago
PKU-ML / LongPPL
Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"
☆91Updated 2 months ago
haozheji / exact-optimization
ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment
☆58Updated last year
RLHFlow / RAFT
This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…
☆33Updated 9 months ago
sail-sg / scaling-with-vocab
[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
☆86Updated 9 months ago
RUCAIBox / RLMEC
The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"
☆38Updated last year
icip-cas / Verifier-Engineering
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
☆61Updated 7 months ago
bethgelab / sober-reasoning
A Sober Look at Language Model Reasoning
☆77Updated last month
yyDing1 / ScaleQuest
[ACL-25] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.
☆63Updated 8 months ago
HKUNLP / critic-rl
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆103Updated 2 months ago
RLHFlow / Directional-Preference-Alignment
Directional Preference Alignment
☆58Updated 9 months ago
zhiyuanhubj / LongRecipe
LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models
☆76Updated 9 months ago
SihengLi99 / SEALONG
Large Language Models Can Self-Improve in Long-context Reasoning
☆71Updated 7 months ago
YangLing0818 / SuperCorrect-llm
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
☆74Updated 3 months ago
junkangwu / beta-DPO
[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
☆45Updated 8 months ago
holarissun / RewardModelingBeyondBradleyTerry
official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…
☆64Updated 3 months ago
idanshen / Value-Augmented-Sampling
☆19Updated last year
ernie-research / Tool-Augmented-Reward-Model
[ICLR'24 spotlight] Tool-Augmented Reward Modeling
☆50Updated last month
uservan / ThinkPO
☆18Updated 4 months ago
test-time-interaction / TTI
☆48Updated last month
ars22 / scaling-LLM-math-synthetic-data
Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"
☆30Updated last year
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆123Updated 10 months ago
activatedgeek / calibration-tuning
☆51Updated 3 months ago
SynthLabsAI / big-math
A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
☆58Updated 4 months ago
ltzheng / SimpleTIR
End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆127Updated this week
hamishivi / automated-instruction-selection
Exploration of automated dataset selection approaches at large scales.
☆47Updated 4 months ago
fangyuan-ksgk / CoT-Reasoning-without-Prompting
Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting
☆32Updated last year
joeljang / RLPHF
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging
☆108Updated last year
kamanphoebe / Look-into-MoEs
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆52Updated 5 months ago