[EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".
☆83Jan 14, 2025Updated last year
Alternatives and similar repositories for dpo-trajectory-reasoning
Users that are interested in dpo-trajectory-reasoning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆695Jan 20, 2025Updated last year
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆54Jun 6, 2025Updated 9 months ago
- ☆73Apr 2, 2024Updated last year
- triton ver of gqa flash attn, based on the tutorial☆12Aug 4, 2024Updated last year
- [𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…☆51May 4, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Repo of paper "Free Process Rewards without Process Labels"☆170Mar 14, 2025Updated last year
- [NAACL 2024] Making Language Models Better Tool Learners with Execution Feedback☆43Mar 14, 2024Updated 2 years ago
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence☆10Mar 2, 2025Updated last year
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆121Dec 10, 2024Updated last year
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆64Jul 8, 2024Updated last year
- A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to…☆58Jul 4, 2023Updated 2 years ago
- (ICML 2025) Rethinking Chain-of-Thought from the Perspective of Self-Training☆13Feb 15, 2025Updated last year
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆53Jul 23, 2025Updated 8 months ago
- Codebase for Inference-Time Policy Adapters☆25Nov 3, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆161Oct 30, 2024Updated last year
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆135Mar 21, 2025Updated last year
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆392Jan 19, 2025Updated last year
- ☆35Sep 14, 2024Updated last year
- This the implementation of LeCo☆32Jan 20, 2025Updated last year
- ☆35May 24, 2025Updated 10 months ago
- A recipe for online RLHF and online iterative DPO.☆543Dec 28, 2024Updated last year
- Data and Code for the paper "FinanceMath: Knowledge-Intensive Math Reasoning in Finance Domains"☆24Aug 10, 2024Updated last year
- The official repository of our survey paper: "Towards a Unified View of Preference Learning for Large Language Models: A Survey"☆190Oct 28, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A curated list of cutting-edge research papers and resources on Long Chain-of-Thought (CoT) Reasoning with Tools.☆46Dec 17, 2025Updated 3 months ago
- [EMNLP '23] Discriminator-Guided Chain-of-Thought Reasoning☆50Oct 11, 2024Updated last year
- ☆20Nov 3, 2024Updated last year
- The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"☆22Apr 22, 2025Updated 11 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆329Jan 29, 2026Updated 2 months ago
- The official code for Dropping Backward Propagation (DropBP)☆32Oct 29, 2024Updated last year
- ☆17Nov 3, 2024Updated last year
- ☆70Jun 18, 2025Updated 9 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆66Oct 18, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- PyTorch implementation of experiments in the paper Aligning Language Models with Human Preferences via a Bayesian Approach☆32Nov 6, 2023Updated 2 years ago
- Official Implementation for the ICLR2023 paper "Fuzzy Alignments in Directed Acyclic Graph for Non-autoregressive Machine Translation"☆14Mar 1, 2023Updated 3 years ago
- ☆10Jan 28, 2024Updated 2 years ago
- [NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective☆43Sep 18, 2025Updated 6 months ago
- ☆553Jan 2, 2025Updated last year
- ☆342Jun 5, 2025Updated 9 months ago
- IKEA: Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent☆69May 13, 2025Updated 10 months ago