verl-project / verl-recipeLinks
A set of examples based on verl for end-to-end RL training recipes.
☆58Updated last week
Alternatives and similar repositories for verl-recipe
Users that are interested in verl-recipe are comparing it to the libraries listed below
Sorting:
- MiroRL is an MCP-first reinforcement learning framework for deep research agent.☆182Updated 3 months ago
- ☆112Updated 2 months ago
- ☆105Updated 3 months ago
- Implementation for FP8/INT8 Rollout for RL training without performence drop.☆279Updated last month
- End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning☆332Updated 2 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆105Updated 2 months ago
- ☆122Updated 6 months ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆190Updated 8 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]☆209Updated 2 weeks ago
- Code accompanying the paper "Noise Contrastive Alignment of Language Models with Explicit Rewards" (NeurIPS 2024)☆57Updated last year
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization☆80Updated 2 months ago
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆99Updated 11 months ago
- The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning☆328Updated 6 months ago
- ☆213Updated 9 months ago
- Towards a Unified View of Large Language Model Post-Training☆192Updated 3 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆240Updated 3 months ago
- Revisiting Mid-training in the Era of Reinforcement Learning Scaling☆181Updated 4 months ago
- Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.☆161Updated 2 months ago
- ☆344Updated 4 months ago
- [ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)☆181Updated 9 months ago
- 🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training☆89Updated last year
- [ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation☆118Updated 6 months ago
- ☆62Updated 5 months ago
- ☆86Updated 3 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆89Updated last year
- Async pipelined version of Verl☆125Updated 8 months ago
- PSFT is a trust-region–inspired fine-tuning objective that views SFT as a policy gradient method with constant advantages, constraining p…☆33Updated 3 months ago
- ☆171Updated last week
- MiroTrain is an efficient and algorithm-first framework for post-training large agentic models.☆99Updated 3 months ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆167Updated last year