[EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".
☆83Jan 14, 2025Updated last year
Alternatives and similar repositories for dpo-trajectory-reasoning
Users that are interested in dpo-trajectory-reasoning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆54Jun 6, 2025Updated 10 months ago
- ☆74Apr 2, 2024Updated 2 years ago
- triton ver of gqa flash attn, based on the tutorial☆12Aug 4, 2024Updated last year
- [𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…☆51May 4, 2024Updated last year
- Repo of paper "Free Process Rewards without Process Labels"☆172Mar 14, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [NAACL 2024] Making Language Models Better Tool Learners with Execution Feedback☆43Mar 14, 2024Updated 2 years ago
- LLM KV Cache compression - K+V dual compression, 73-99% VRAM savings, zero accuracy loss☆49Mar 30, 2026Updated 2 weeks ago
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence☆10Mar 2, 2025Updated last year
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆121Dec 10, 2024Updated last year
- MPO: Boosting LLM Agents with Meta Plan Optimization (EMNLP 2025 Findings)☆80Aug 20, 2025Updated 7 months ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆64Jul 8, 2024Updated last year
- A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to…☆58Jul 4, 2023Updated 2 years ago
- (ICML 2025) Rethinking Chain-of-Thought from the Perspective of Self-Training☆13Feb 15, 2025Updated last year
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆55Jul 23, 2025Updated 8 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆161Oct 30, 2024Updated last year
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆135Mar 21, 2025Updated last year
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆392Jan 19, 2025Updated last year
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆73Jul 13, 2025Updated 9 months ago
- ☆35Sep 14, 2024Updated last year
- This the implementation of LeCo☆32Jan 20, 2025Updated last year
- ☆35May 24, 2025Updated 10 months ago
- A recipe for online RLHF and online iterative DPO.☆543Dec 28, 2024Updated last year
- A curated list of cutting-edge research papers and resources on Long Chain-of-Thought (CoT) Reasoning with Tools.☆47Dec 17, 2025Updated 4 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A scalable automated alignment method for large language models. Resources for "Aligning Large Language Models via Self-Steering Optimiza…☆20Nov 21, 2024Updated last year
- [EMNLP '23] Discriminator-Guided Chain-of-Thought Reasoning☆50Oct 11, 2024Updated last year
- ☆20Nov 3, 2024Updated last year
- The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"☆22Apr 22, 2025Updated 11 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆329Jan 29, 2026Updated 2 months ago
- Data and Code for the paper "FinanceMath: Knowledge-Intensive Math Reasoning in Finance Domains"☆24Aug 10, 2024Updated last year
- The official code for Dropping Backward Propagation (DropBP)☆32Oct 29, 2024Updated last year
- ☆17Nov 3, 2024Updated last year
- ☆70Jun 18, 2025Updated 10 months ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆66Oct 18, 2024Updated last year
- PyTorch implementation of experiments in the paper Aligning Language Models with Human Preferences via a Bayesian Approach☆32Nov 6, 2023Updated 2 years ago
- Official Implementation for the ICLR2023 paper "Fuzzy Alignments in Directed Acyclic Graph for Non-autoregressive Machine Translation"☆14Mar 1, 2023Updated 3 years ago
- ☆10Jan 28, 2024Updated 2 years ago
- ☆553Jan 2, 2025Updated last year
- [NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective☆43Sep 18, 2025Updated 7 months ago
- ☆341Jun 5, 2025Updated 10 months ago