[ICLR 2026] Agentic Reinforced Policy Optimization (ARPO)
β892Jan 28, 2026Updated last month
Alternatives and similar repositories for ARPO
Users that are interested in ARPO are comparing it to the libraries listed below
Sorting:
- π§Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learningβ319Jan 3, 2026Updated 2 months ago
- [ICLR 2026] End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoningβ358Jan 12, 2026Updated last month
- RAG methods, benchmarks, and toolkitsβ19Nov 28, 2024Updated last year
- verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-inβ¦β1,563Updated this week
- β32Feb 13, 2026Updated 2 weeks ago
- β170Feb 24, 2026Updated last week
- A version of verl to support diverse tool useβ879Feb 19, 2026Updated last week
- β67Aug 14, 2025Updated 6 months ago
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replayβ148May 29, 2025Updated 9 months ago
- ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning & ReCall: Learning to Reason with Tool Call for LLMs via Reiβ¦β1,328May 16, 2025Updated 9 months ago
- Some example codes for drawing figures in research paperβ35Mar 3, 2022Updated 4 years ago
- Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRLβ4,085Nov 13, 2025Updated 3 months ago
- verl: Volcano Engine Reinforcement Learning for LLMsβ19,519Updated this week
- β215Feb 20, 2025Updated last year
- An Open-source RL System from ByteDance Seed and Tsinghua AIRβ1,739May 11, 2025Updated 9 months ago
- β179Dec 5, 2025Updated 2 months ago
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.β705Oct 15, 2025Updated 4 months ago
- π Search-o1: Agentic Search-Enhanced Large Reasoning Models [EMNLP 2025]β1,172Nov 17, 2025Updated 3 months ago
- MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search tooβ¦β402Aug 26, 2025Updated 6 months ago
- Revisiting Mid-training in the Era of Reinforcement Learning Scalingβ183Jul 23, 2025Updated 7 months ago
- Official Repo for Open-Reasoner-Zeroβ2,087Jun 2, 2025Updated 9 months ago
- Democratizing Reinforcement Learning for LLMsβ5,167Updated this week
- A series of technical report on Slow Thinking with LLMβ760Aug 13, 2025Updated 6 months ago
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)β9,037Feb 21, 2026Updated last week
- β1,584Jan 20, 2026Updated last month
- Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL.β536Sep 8, 2025Updated 5 months ago
- RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.β2,522Updated this week
- EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRLβ4,649Updated this week
- [NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Rewardβ946Feb 16, 2025Updated last year
- An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Modelsβ2,881Updated this week
- Simple RL training for reasoningβ3,830Dec 23, 2025Updated 2 months ago
- β813Jun 9, 2025Updated 8 months ago
- β33Jul 15, 2025Updated 7 months ago
- Understanding R1-Zero-Like Training: A Critical Perspectiveβ1,219Aug 27, 2025Updated 6 months ago
- SSRL: Self-Search Reinforcement Learningβ207Aug 20, 2025Updated 6 months ago
- β335May 24, 2025Updated 9 months ago
- A library for generating difficulty-scalable, multi-tool, and verifiable agentic tasks with execution trajectories.β178Jul 6, 2025Updated 7 months ago
- Reproduce R1 Zero on Logic Puzzleβ2,439Mar 20, 2025Updated 11 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.β421Jul 11, 2025Updated 7 months ago