McGill-NLP / VinePPOView external linksLinks
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆186May 25, 2025Updated 8 months ago
Alternatives and similar repositories for VinePPO
Users that are interested in VinePPO are comparing it to the libraries listed below
Sorting:
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆86May 21, 2025Updated 8 months ago
- [𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…☆51May 4, 2024Updated last year
- Self-Supervised Alignment with Mutual Information☆20May 24, 2024Updated last year
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆202Apr 17, 2025Updated 9 months ago
- ☆34Sep 14, 2024Updated last year
- Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers☆27Mar 1, 2025Updated 11 months ago
- Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"☆31Jun 5, 2025Updated 8 months ago
- ☆342Jun 5, 2025Updated 8 months ago
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆120Dec 10, 2024Updated last year
- Super-Efficient RLHF Training of LLMs with Parameter Reallocation☆331Apr 24, 2025Updated 9 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆329Jan 29, 2026Updated 2 weeks ago
- A Gym for Agentic LLMs☆446Jan 21, 2026Updated 3 weeks ago
- Process Reward Models That Think☆77Nov 29, 2025Updated 2 months ago
- [AAAI 2025] Augmenting Math Word Problems via Iterative Question Composing (https://arxiv.org/abs/2401.09003)☆23Oct 2, 2025Updated 4 months ago
- ☆123Feb 21, 2025Updated 11 months ago
- ☆331May 31, 2025Updated 8 months ago
- 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.☆628Jan 29, 2026Updated 2 weeks ago
- Codebase for HiP☆90Dec 15, 2023Updated 2 years ago
- Recipes to train reward model for RLHF.☆1,512Apr 24, 2025Updated 9 months ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆78Aug 17, 2024Updated last year
- The code and data for the paper JiuZhang3.0☆49May 26, 2024Updated last year
- Lipschitz Lifelong RL☆11Nov 6, 2020Updated 5 years ago
- Scalable RL solution for advanced reasoning of language models☆1,805Mar 18, 2025Updated 10 months ago
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]☆96Apr 9, 2025Updated 10 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆120May 6, 2025Updated 9 months ago
- Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…☆32Apr 20, 2024Updated last year
- Understanding R1-Zero-Like Training: A Critical Perspective☆1,209Aug 27, 2025Updated 5 months ago
- Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…☆63Oct 3, 2025Updated 4 months ago
- CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)☆73Jun 25, 2024Updated last year
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆148Nov 26, 2024Updated last year
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆391Jan 19, 2025Updated last year
- ☆224Mar 26, 2025Updated 10 months ago
- Self-Alignment with Principle-Following Reward Models☆169Sep 18, 2025Updated 4 months ago
- ☆30Jun 19, 2023Updated 2 years ago
- KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality☆40Dec 1, 2025Updated 2 months ago
- A version of verl to support diverse tool use☆868Jan 6, 2026Updated last month
- A Sober Look at Language Model Reasoning☆92Nov 18, 2025Updated 2 months ago
- A flexible and efficient training framework for large-scale alignment tasks☆449Oct 23, 2025Updated 3 months ago
- ☆1,095Jan 10, 2026Updated last month