[EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".
☆83Jan 14, 2025Updated last year
Alternatives and similar repositories for dpo-trajectory-reasoning
Users that are interested in dpo-trajectory-reasoning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆702Jan 20, 2025Updated last year
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆54Jun 6, 2025Updated 11 months ago
- ☆74Apr 2, 2024Updated 2 years ago
- [𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…☆51May 4, 2024Updated 2 years ago
- Repo of paper "Free Process Rewards without Process Labels"☆171Mar 14, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [NAACL 2024] Making Language Models Better Tool Learners with Execution Feedback☆43Mar 14, 2024Updated 2 years ago
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence☆10Mar 2, 2025Updated last year
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆121Dec 10, 2024Updated last year
- MPO: Boosting LLM Agents with Meta Plan Optimization (EMNLP 2025 Findings)☆80Aug 20, 2025Updated 8 months ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆65Jul 8, 2024Updated last year
- A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to…☆58Jul 4, 2023Updated 2 years ago
- (ICML 2025) Rethinking Chain-of-Thought from the Perspective of Self-Training☆13Feb 15, 2025Updated last year
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆55Jul 23, 2025Updated 9 months ago
- Codebase for Inference-Time Policy Adapters☆25Nov 3, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆162Oct 30, 2024Updated last year
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆136Mar 21, 2025Updated last year
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆395Jan 19, 2025Updated last year
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆74Jul 13, 2025Updated 9 months ago
- ☆35Sep 14, 2024Updated last year
- This the implementation of LeCo☆32Jan 20, 2025Updated last year
- ☆35May 24, 2025Updated 11 months ago
- A recipe for online RLHF and online iterative DPO.☆544Dec 28, 2024Updated last year
- A curated list of cutting-edge research papers and resources on Long Chain-of-Thought (CoT) Reasoning with Tools.☆47Dec 17, 2025Updated 4 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- The official repository of our survey paper: "Towards a Unified View of Preference Learning for Large Language Models: A Survey"☆191Oct 28, 2024Updated last year
- LLM KV Cache compression - K+V dual compression, 73-99% VRAM savings, zero accuracy loss☆51Mar 30, 2026Updated last month
- [EMNLP '23] Discriminator-Guided Chain-of-Thought Reasoning☆50Oct 11, 2024Updated last year
- ☆20Nov 3, 2024Updated last year
- The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"☆22Apr 22, 2025Updated last year
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆330Jan 29, 2026Updated 3 months ago
- Data and Code for the paper "FinanceMath: Knowledge-Intensive Math Reasoning in Finance Domains"☆24Aug 10, 2024Updated last year
- The official code for Dropping Backward Propagation (DropBP)☆32Oct 29, 2024Updated last year
- ☆17Nov 3, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆71Jun 18, 2025Updated 10 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆66Oct 18, 2024Updated last year
- PyTorch implementation of experiments in the paper Aligning Language Models with Human Preferences via a Bayesian Approach☆32Nov 6, 2023Updated 2 years ago
- Official Implementation for the ICLR2023 paper "Fuzzy Alignments in Directed Acyclic Graph for Non-autoregressive Machine Translation"☆14Mar 1, 2023Updated 3 years ago
- ☆10Jan 28, 2024Updated 2 years ago
- ☆552Jan 2, 2025Updated last year
- [NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective☆43Sep 18, 2025Updated 7 months ago