Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆189May 25, 2025Updated 10 months ago
Alternatives and similar repositories for VinePPO
Users that are interested in VinePPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆86May 21, 2025Updated 10 months ago
- ☆35Sep 14, 2024Updated last year
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆202Apr 17, 2025Updated 11 months ago
- Self-Supervised Alignment with Mutual Information☆20May 24, 2024Updated last year
- The code and data for the paper JiuZhang3.0☆49May 26, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Scalable RL solution for advanced reasoning of language models☆1,831Mar 18, 2025Updated last year
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆121Dec 10, 2024Updated last year
- Understanding R1-Zero-Like Training: A Critical Perspective☆1,237Aug 27, 2025Updated 7 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆123May 6, 2025Updated 10 months ago
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆392Jan 19, 2025Updated last year
- ☆334May 31, 2025Updated 9 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆124Sep 9, 2024Updated last year
- ☆225Mar 26, 2025Updated last year
- Recipes to train reward model for RLHF.☆1,523Apr 24, 2025Updated 11 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆329Jan 29, 2026Updated 2 months ago
- 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.☆641Jan 29, 2026Updated 2 months ago
- A Gym for Agentic LLMs☆471Jan 21, 2026Updated 2 months ago
- ☆124Feb 21, 2025Updated last year
- Super-Efficient RLHF Training of LLMs with Parameter Reallocation☆334Apr 24, 2025Updated 11 months ago
- [𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…☆51May 4, 2024Updated last year
- Simple RL training for reasoning☆3,842Dec 23, 2025Updated 3 months ago
- [TMLR] Process Reward Models That Think☆83Nov 29, 2025Updated 4 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆148Nov 26, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]☆99Apr 9, 2025Updated 11 months ago
- Lipschitz Lifelong RL☆11Nov 6, 2020Updated 5 years ago
- Codebase for HiP☆90Dec 15, 2023Updated 2 years ago
- ☆342Jun 5, 2025Updated 9 months ago
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models☆68Mar 5, 2026Updated 3 weeks ago
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆71Jul 13, 2025Updated 8 months ago
- ☆30Jun 19, 2023Updated 2 years ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆66Oct 18, 2024Updated last year
- Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers☆28Mar 1, 2025Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- A flexible and efficient training framework for large-scale alignment tasks☆451Oct 23, 2025Updated 5 months ago
- A bibliography and survey of the papers surrounding o1☆1,213Nov 16, 2024Updated last year
- A version of verl to support diverse tool use☆923Mar 2, 2026Updated 3 weeks ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆62Mar 30, 2024Updated last year
- Repo of paper "Free Process Rewards without Process Labels"☆170Mar 14, 2025Updated last year
- Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"☆32Jun 5, 2025Updated 9 months ago
- ☆1,118Jan 10, 2026Updated 2 months ago