McGill-NLP/VinePPO

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/McGill-NLP/VinePPO)

McGill-NLP / VinePPO

Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"

☆192

Alternatives and similar repositories for VinePPO

Users that are interested in VinePPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

hkust-nlp / B-STaR
View on GitHub
B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
☆86May 21, 2025Updated last year
swtheing / PF-PPO-RLHF
View on GitHub
☆34Sep 14, 2024Updated last year
janphilippfranken / sami
View on GitHub
Self-Supervised Alignment with Mutual Information
☆20May 24, 2024Updated 2 years ago
YifeiZhou02 / ArCHer
View on GitHub
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
☆208Apr 17, 2025Updated last year
RUCAIBox / JiuZhang3.0
View on GitHub
The code and data for the paper JiuZhang3.0
☆49May 26, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
axon-rl / gem
View on GitHub
A Gym for Agentic LLMs
☆502Jan 21, 2026Updated 5 months ago
PRIME-RL / PRIME
View on GitHub
Scalable RL solution for advanced reasoning of language models
☆1,865Mar 18, 2025Updated last year
complex-reasoning / RPG
View on GitHub
[ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)
☆76Jun 29, 2026Updated 3 weeks ago
hkust-nlp / dart-math
View on GitHub
[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
☆120Dec 10, 2024Updated last year
sail-sg / understand-r1-zero
View on GitHub
Understanding R1-Zero-Like Training: A Critical Perspective
☆1,267Aug 27, 2025Updated 10 months ago
McGill-NLP / the-markovian-thinker
View on GitHub
Code for paper "The Markovian Thinker: Architecture-Agnostic Linear Scaling of Reasoning"
☆349Mar 16, 2026Updated 4 months ago
CMU-AIRe / POPE
View on GitHub
☆27Jan 31, 2026Updated 5 months ago
HKUNLP / critic-rl
View on GitHub
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆126May 6, 2025Updated last year
JIA-Lab-research / Step-DPO
View on GitHub
Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
☆398Jan 19, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
eddycmu / demystify-long-cot
View on GitHub
☆336May 31, 2025Updated last year
Edward-Sun / easy-to-hard
View on GitHub
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆124Sep 9, 2024Updated last year
WujiangXu / EPO
View on GitHub
The code for paper "EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning"
☆40Jul 13, 2026Updated last week
YuxiXie / MCTS-DPO
View on GitHub
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
☆331Jan 29, 2026Updated 5 months ago
RLHFlow / RLHF-Reward-Modeling
View on GitHub
Recipes to train reward model for RLHF.
☆1,534Apr 24, 2025Updated last year
kanishkg / cognitive-behaviors
View on GitHub
☆224Mar 26, 2025Updated last year
SalesforceAIResearch / LaTRO
View on GitHub
☆127Jun 2, 2026Updated last month
openpsi-project / ReaLHF
View on GitHub
Super-Efficient RLHF Training of LLMs with Parameter Reallocation
☆335Apr 24, 2025Updated last year
hbin0701 / Self-Explore
View on GitHub
[𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…
☆52May 4, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
hkust-nlp / simpleRL-reason
View on GitHub
Simple RL training for reasoning
☆3,868Dec 23, 2025Updated 6 months ago
TIGER-AI-Lab / AceCoder
View on GitHub
The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]
☆100Apr 9, 2025Updated last year
Berkeley-NLP / Agent-Eval-Refine
View on GitHub
Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]
☆149Nov 26, 2024Updated last year
SuReLI / llrl
View on GitHub
Lipschitz Lifelong RL
☆11Nov 6, 2020Updated 5 years ago
anuragajay / hip
View on GitHub
Codebase for HiP
☆90Dec 15, 2023Updated 2 years ago
MARIO-Math-Reasoning / Super_MARIO
View on GitHub
☆341Jun 5, 2025Updated last year
hkust-nlp / model-task-align-rl
View on GitHub
[ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".
☆18Feb 9, 2026Updated 5 months ago
hkust-nlp / mstar
View on GitHub
[ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
☆75Jul 13, 2025Updated last year
mnoukhov / async_rlhf
View on GitHub
Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models
☆68Mar 5, 2026Updated 4 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
sail-sg / oat
View on GitHub
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
☆667Jan 29, 2026Updated 5 months ago
ethz-spylab / superhuman-ai-consistency
View on GitHub
☆30Jun 19, 2023Updated 3 years ago
WeiminXiong / IPR
View on GitHub
Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)
☆68Oct 18, 2024Updated last year
srush / awesome-o1
View on GitHub
A bibliography and survey of the papers surrounding o1
☆1,214Jul 7, 2026Updated last week
alibaba / ChatLearn
View on GitHub
A flexible and efficient training framework for large-scale alignment tasks
☆452Oct 23, 2025Updated 8 months ago
flowersteam / WorldLLM
View on GitHub
LLM as World Models using Bayesian inference
☆21May 27, 2025Updated last year
iiis-ai / IterativeQuestionComposing
View on GitHub
[AAAI 2025] Augmenting Math Word Problems via Iterative Question Composing (https://arxiv.org/abs/2401.09003)
☆23Oct 2, 2025Updated 9 months ago