SparkJiao/dpo-trajectory-reasoning

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/SparkJiao/dpo-trajectory-reasoning)

SparkJiao / dpo-trajectory-reasoning

[EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".

☆84

Alternatives and similar repositories for dpo-trajectory-reasoning

Users that are interested in dpo-trajectory-reasoning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

THUDM / ReST-MCTS
View on GitHub
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)
☆709Jan 20, 2025Updated last year
ernie-research / Tool-Augmented-Reward-Model
View on GitHub
[ICLR'24 spotlight] Tool-Augmented Reward Modeling
☆54Jun 6, 2025Updated last year
NonvolatileMemory / flash_attn_gqa
View on GitHub
triton ver of gqa flash attn, based on the tutorial
☆12Aug 4, 2024Updated last year
FreedomIntelligence / OVM
View on GitHub
☆74Apr 2, 2024Updated 2 years ago
zjunlp / TRICE
View on GitHub
[NAACL 2024] Making Language Models Better Tool Learners with Execution Feedback
☆43Mar 14, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
hbin0701 / Self-Explore
View on GitHub
[𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…
☆52May 4, 2024Updated 2 years ago
PRIME-RL / ImplicitPRM
View on GitHub
Repo of paper "Free Process Rewards without Process Labels"
☆172Mar 14, 2025Updated last year
hkust-nlp / dart-math
View on GitHub
[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
☆120Dec 10, 2024Updated last year
Lux0926 / ASPRM
View on GitHub
AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence
☆10Mar 2, 2025Updated last year
qtli / GSM-Plus
View on GitHub
GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.
☆66Jul 8, 2024Updated 2 years ago
GXimingLu / IPA
View on GitHub
Codebase for Inference-Time Policy Adapters
☆25Nov 3, 2023Updated 2 years ago
sail-sg / CPO
View on GitHub
[NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.
☆137Mar 21, 2025Updated last year
Yifan-Song793 / ETO
View on GitHub
Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)
☆168Oct 30, 2024Updated last year
JIA-Lab-research / Step-DPO
View on GitHub
Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
☆398Jan 19, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
swtheing / PF-PPO-RLHF
View on GitHub
☆34Sep 14, 2024Updated last year
yuyq18 / StepTool
View on GitHub
☆36May 24, 2025Updated last year
starrYYxuan / LeCo
View on GitHub
This the implementation of LeCo
☆33Jan 20, 2025Updated last year
zongqianwu / ST-COT
View on GitHub
(ICML 2025) Rethinking Chain-of-Thought from the Perspective of Self-Training
☆13Feb 15, 2025Updated last year
WooSunghyeon / dropbp
View on GitHub
The official code for Dropping Backward Propagation (DropBP)
☆32Oct 29, 2024Updated last year
RLHFlow / Online-RLHF
View on GitHub
A recipe for online RLHF and online iterative DPO.
☆544Dec 28, 2024Updated last year
Lyun0912-wu / LongAttn
View on GitHub
LongAttn ：Selecting Long-context Training Data via Token-level Attention
☆15Jul 16, 2025Updated last year
WeiminXiong / MPO
View on GitHub
MPO: Boosting LLM Agents with Meta Plan Optimization (EMNLP 2025 Findings)
☆81Aug 20, 2025Updated 11 months ago
icip-cas / SSO
View on GitHub
A scalable automated alignment method for large language models. Resources for "Aligning Large Language Models via Self-Steering Optimiza…
☆20Nov 21, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
yifeiwang77 / Self-Correction
View on GitHub
☆20Nov 3, 2024Updated last year
yale-nlp / FinanceMath
View on GitHub
Data and Code for the paper "FinanceMath: Knowledge-Intensive Math Reasoning in Finance Domains"
☆25Jul 14, 2026Updated 2 weeks ago
xiwenc1 / DRA-GRPO
View on GitHub
Official code for the paper: DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models
☆24Jan 6, 2026Updated 6 months ago
mukhal / GRACE
View on GitHub
[EMNLP '23] Discriminator-Guided Chain-of-Thought Reasoning
☆50Oct 11, 2024Updated last year
KbsdJames / Awesome-LLM-Preference-Learning
View on GitHub
The official repository of our survey paper: "Towards a Unified View of Preference Learning for Large Language Models: A Survey"
☆192Oct 28, 2024Updated last year
YuxiXie / MCTS-DPO
View on GitHub
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
☆331Jan 29, 2026Updated 6 months ago
sail-sg / LightTrans
View on GitHub
The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"
☆22Apr 22, 2025Updated last year
hanningzhang / prm
View on GitHub
☆17Nov 3, 2024Updated last year
WeiminXiong / IPR
View on GitHub
Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)
☆68Oct 18, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
wangjs9 / Aligned-dPM
View on GitHub
PyTorch implementation of experiments in the paper Aligning Language Models with Human Preferences via a Bayesian Approach
☆32Nov 6, 2023Updated 2 years ago
ictnlp / FA-DAT
View on GitHub
Official Implementation for the ICLR2023 paper "Fuzzy Alignments in Directed Acyclic Graph for Non-autoregressive Machine Translation"
☆14Mar 1, 2023Updated 3 years ago
hzy312 / knowledge-r1
View on GitHub
IKEA: Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent
☆70May 13, 2025Updated last year
lqtrung1998 / mwp_ReFT
View on GitHub
☆554Jan 2, 2025Updated last year
MARIO-Math-Reasoning / Super_MARIO
View on GitHub
☆341Jun 5, 2025Updated last year
TEAM-ARM / arm
View on GitHub
[NeurIPS'25 Spotlight] ARM: Adaptive Reasoning Model
☆68Apr 6, 2026Updated 3 months ago
GAIR-NLP / weak-to-strong-reasoning
View on GitHub
☆59Sep 2, 2024Updated last year