Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆186May 25, 2025Updated 9 months ago
Alternatives and similar repositories for VinePPO
Users that are interested in VinePPO are comparing it to the libraries listed below
Sorting:
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆86May 21, 2025Updated 9 months ago
- [𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…☆51May 4, 2024Updated last year
- Self-Supervised Alignment with Mutual Information☆20May 24, 2024Updated last year
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆202Apr 17, 2025Updated 10 months ago
- ☆34Sep 14, 2024Updated last year
- Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"☆31Jun 5, 2025Updated 9 months ago
- Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers☆28Mar 1, 2025Updated last year
- ☆342Jun 5, 2025Updated 9 months ago
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆121Dec 10, 2024Updated last year
- Super-Efficient RLHF Training of LLMs with Parameter Reallocation☆333Apr 24, 2025Updated 10 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆329Jan 29, 2026Updated last month
- [TMLR] Process Reward Models That Think☆81Nov 29, 2025Updated 3 months ago
- [AAAI 2025] Augmenting Math Word Problems via Iterative Question Composing (https://arxiv.org/abs/2401.09003)☆23Oct 2, 2025Updated 5 months ago
- ☆123Feb 21, 2025Updated last year
- ☆332May 31, 2025Updated 9 months ago
- A Gym for Agentic LLMs☆455Jan 21, 2026Updated last month
- 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.☆637Jan 29, 2026Updated last month
- Codebase for HiP☆90Dec 15, 2023Updated 2 years ago
- Recipes to train reward model for RLHF.☆1,517Apr 24, 2025Updated 10 months ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆78Aug 17, 2024Updated last year
- The code and data for the paper JiuZhang3.0☆49May 26, 2024Updated last year
- Lipschitz Lifelong RL☆11Nov 6, 2020Updated 5 years ago
- Scalable RL solution for advanced reasoning of language models☆1,811Mar 18, 2025Updated 11 months ago
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]☆98Apr 9, 2025Updated 11 months ago
- Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…☆32Apr 20, 2024Updated last year
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆123May 6, 2025Updated 10 months ago
- Understanding R1-Zero-Like Training: A Critical Perspective☆1,222Aug 27, 2025Updated 6 months ago
- Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…☆63Oct 3, 2025Updated 5 months ago
- CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)☆73Jun 25, 2024Updated last year
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆148Nov 26, 2024Updated last year
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆392Jan 19, 2025Updated last year
- ☆225Mar 26, 2025Updated 11 months ago
- Self-Alignment with Principle-Following Reward Models☆170Sep 18, 2025Updated 5 months ago
- ☆30Jun 19, 2023Updated 2 years ago
- KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality☆40Dec 1, 2025Updated 3 months ago
- A Sober Look at Language Model Reasoning☆93Nov 18, 2025Updated 3 months ago
- A version of verl to support diverse tool use☆889Mar 2, 2026Updated last week
- A flexible and efficient training framework for large-scale alignment tasks☆451Oct 23, 2025Updated 4 months ago
- ☆53Feb 11, 2025Updated last year