mathllm / Step-Controlled_DPOLinks

☆22

Alternatives and similar repositories for Step-Controlled_DPO

Users that are interested in Step-Controlled_DPO are comparing it to the libraries listed below

Sorting:

jinzhuoran / RAG-RewardBench
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
☆16Updated 10 months ago
yayayacc / MUR
☆45Updated 3 weeks ago
RUCAIBox / RLMEC
The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"
☆38Updated last year
open-compass / GPassK
[ACL 2025] Are Your LLMs Capable of Stable Reasoning?
☆30Updated 2 months ago
uservan / ThinkPO
☆17Updated 2 months ago
DAMO-NLP-SG / LongPO
[ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
☆41Updated 7 months ago
rookie-joe / AutoPSV
☆50Updated 11 months ago
YangLing0818 / SuperCorrect-llm
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
☆82Updated 7 months ago
lichengliu03 / unary-feedback
☆38Updated 2 months ago
yyDing1 / ScaleQuest
[ACL-25] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.
☆68Updated 11 months ago
fangyuan-ksgk / CoT-Reasoning-without-Prompting
Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting
☆33Updated last year
RUCAIBox / JiuZhang3.0
The code and data for the paper JiuZhang3.0
☆49Updated last year
john-hewitt / implicit-ins
Codebase for Instruction Following without Instruction Tuning
☆36Updated last year
LAMDASZ-ML / Self-Backtracking
☆50Updated 8 months ago
KbsdJames / Omni-MATH
The official repository of the Omni-MATH benchmark.
☆88Updated 10 months ago
test-time-interaction / TTI
☆63Updated 4 months ago
ChengpengLi1003 / DotaMath
☆30Updated 9 months ago
rhyang2021 / ARIA
Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".
☆23Updated 2 months ago
NuoJohnChen / JudgeLRM
JudgeLRM: Large Reasoning Models as a Judge
☆40Updated last month
KbsdJames / omni-math-rule
The rule-based evaluation subset and code implementation of Omni-MATH
☆23Updated 10 months ago
SihengLi99 / SEALONG
Large Language Models Can Self-Improve in Long-context Reasoning
☆73Updated 11 months ago
NineAbyss / S2R
This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"
☆69Updated 6 months ago
yale-nlp / refdpo
☆16Updated last year
THUDM / TreeRL
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25
☆71Updated 4 months ago
THUDM / T1
RL Scaling and Test-Time Scaling (ICML'25)
☆111Updated 9 months ago
GAIR-NLP / ReasonEval
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
☆73Updated 2 weeks ago
THU-KEG / RM-Bench
[ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
☆63Updated 3 months ago
hkust-nlp / GUIMid
☆21Updated 5 months ago
HKUNLP / critic-rl
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆114Updated 5 months ago
yuleiqin / RAIF
A Recipe for Building LLM Reasoners to Solve Complex Instructions
☆27Updated 2 weeks ago