A set of examples based on verl for end-to-end RL training recipes.
☆247Apr 15, 2026Updated this week
Alternatives and similar repositories for verl-recipe
Users that are interested in verl-recipe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Archer2.0 evolves from its predecessor by introducing ASPO, which overcomes fundamental PPO-Clip limitations to prevent premature converg…☆31Oct 10, 2025Updated 6 months ago
- ☆16Jul 29, 2025Updated 8 months ago
- Adaptive Multimodal Reasoning via Reinforcement Learning☆23Jan 11, 2026Updated 3 months ago
- Official code repository of Shuffle-R1☆25Feb 23, 2026Updated last month
- ☆17Nov 3, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- The code for paper "EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning"☆37Oct 1, 2025Updated 6 months ago
- [TMLR] Process Reward Models That Think☆85Nov 29, 2025Updated 4 months ago
- Resa: Transparent Reasoning Models via SAEs☆48Sep 23, 2025Updated 6 months ago
- A version of verl to support diverse tool use☆949Mar 2, 2026Updated last month
- Code for "Variational Reasoning for Language Models"☆59Sep 29, 2025Updated 6 months ago
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models☆68Mar 5, 2026Updated last month
- Saving Dense Retriever from Shortcut Dependency in Conversational Search (EMNLP 2022)☆18Nov 24, 2022Updated 3 years ago
- ☆41Jul 15, 2025Updated 9 months ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]☆187Jun 5, 2025Updated 10 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- MUA-RL: MULTI-TURN USER-INTERACTING AGENT REINFORCEMENT LEARNING FOR AGENTIC TOOL USE☆58Nov 5, 2025Updated 5 months ago
- ☆11Jul 21, 2024Updated last year
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆69Jul 20, 2023Updated 2 years ago
- Code for "When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search" (NeurIPS 2024)☆18Oct 22, 2024Updated last year
- Math-VR Benchmark & CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images☆57Nov 4, 2025Updated 5 months ago
- An Ultra-Long Output Reinforcement Learning Approach☆23Jul 31, 2025Updated 8 months ago
- Distributed IO-aware Attention algorithm☆24Sep 24, 2025Updated 6 months ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆437Mar 20, 2026Updated last month
- Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities☆1,175Jul 15, 2025Updated 9 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- verl: Volcano Engine Reinforcement Learning for LLMs☆20,789Updated this week
- Ideas for projects related to Tinker☆173Nov 6, 2025Updated 5 months ago
- ☆71Aug 6, 2025Updated 8 months ago
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction☆171Mar 23, 2025Updated last year
- Vortex: A Flexible and Efficient Sparse Attention Framework☆51Updated this week
- ICML 2025 Spotlight, PCEvolve: Private Contrastive Evolution for Synthetic Dataset Generation via Few-Shot Private Data and Generative AP…☆14Jun 27, 2025Updated 9 months ago
- EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL☆4,860Apr 6, 2026Updated 2 weeks ago
- DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation.☆136Feb 10, 2026Updated 2 months ago
- (ICLR 2026)Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆59Jan 26, 2026Updated 2 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- AFlow & MathAI☆19Feb 24, 2025Updated last year
- ☆41Jul 6, 2025Updated 9 months ago
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆11Dec 13, 2023Updated 2 years ago
- ☆10Sep 18, 2017Updated 8 years ago
- [ICCV 2025] Dynamic-VLM☆28Dec 16, 2024Updated last year
- OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation☆35Apr 1, 2026Updated 2 weeks ago
- Indonesian speech/phoneme recognizer powered by Kaldi 2.0 (lhotse, icefall, sherpa).☆15Jun 30, 2023Updated 2 years ago