SparkJiao / dpo-trajectory-reasoningView external linksLinks
[EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".
☆83Jan 14, 2025Updated last year
Alternatives and similar repositories for dpo-trajectory-reasoning
Users that are interested in dpo-trajectory-reasoning are comparing it to the libraries listed below
Sorting:
- triton ver of gqa flash attn, based on the tutorial☆12Aug 4, 2024Updated last year
- ☆72Apr 2, 2024Updated last year
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆690Jan 20, 2025Updated last year
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆53Jun 6, 2025Updated 8 months ago
- [𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…☆51May 4, 2024Updated last year
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆134Mar 21, 2025Updated 10 months ago
- This the implementation of LeCo☆31Jan 20, 2025Updated last year
- A recipe for online RLHF and online iterative DPO.☆539Dec 28, 2024Updated last year
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆120Dec 10, 2024Updated last year
- A scalable automated alignment method for large language models. Resources for "Aligning Large Language Models via Self-Steering Optimiza…☆20Nov 21, 2024Updated last year
- [NAACL 2024] Making Language Models Better Tool Learners with Execution Feedback☆43Mar 14, 2024Updated last year
- Data and Code for the paper "FinanceMath: Knowledge-Intensive Math Reasoning in Finance Domains"☆24Aug 10, 2024Updated last year
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆391Jan 19, 2025Updated last year
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆64Jul 8, 2024Updated last year
- Lightweight Adapting for Black-Box Large Language Models☆25Feb 15, 2024Updated 2 years ago
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence☆10Mar 2, 2025Updated 11 months ago
- ☆23Jul 5, 2024Updated last year
- Repo of paper "Free Process Rewards without Process Labels"☆168Mar 14, 2025Updated 11 months ago
- ☆70Jun 18, 2025Updated 7 months ago
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆70Jul 13, 2025Updated 7 months ago
- [EMNLP '23] Discriminator-Guided Chain-of-Thought Reasoning☆50Oct 11, 2024Updated last year
- iMessage RAG MCP Server from Anthropic MCP Hackathon (NYC)☆14Mar 10, 2025Updated 11 months ago
- ☆10Jan 28, 2024Updated 2 years ago
- Codebase for Inference-Time Policy Adapters☆25Nov 3, 2023Updated 2 years ago
- Personalized Fashion Compatibility Modeling via Metapath-guided Heterogeneous Graph Learning.☆15Nov 7, 2022Updated 3 years ago
- Official Implementation for the ICLR2023 paper "Fuzzy Alignments in Directed Acyclic Graph for Non-autoregressive Machine Translation"☆13Mar 1, 2023Updated 2 years ago
- ☆14May 9, 2024Updated last year
- 集中管理所有的prompt。☆14Nov 27, 2024Updated last year
- The official repository of our survey paper: "Towards a Unified View of Preference Learning for Large Language Models: A Survey"☆189Oct 28, 2024Updated last year
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆159Oct 30, 2024Updated last year
- Paper Reproduction Google SCoRE(Training Language Models to Self-Correct via Reinforcement Learning)☆142Sep 21, 2024Updated last year
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆329Jan 29, 2026Updated 2 weeks ago
- (ICML 2025) Rethinking Chain-of-Thought from the Perspective of Self-Training☆13Feb 15, 2025Updated last year
- Repo for the paper: Towards Few-shot Entity Recognition in Document Images:A Label-aware Sequence-to-Sequence Framework☆14May 31, 2023Updated 2 years ago
- [EMNLP'23] Code for 'Rethinking Negative Pairs in Code Search'☆14Oct 17, 2023Updated 2 years ago
- Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.☆18Apr 22, 2025Updated 9 months ago
- ☆554Jan 2, 2025Updated last year
- Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)☆51May 12, 2025Updated 9 months ago
- Awesome Triton Resources☆39Apr 27, 2025Updated 9 months ago