Agent-One-Lab / AgentFlyLinks
Scalable and extensible reinforcement learning for LM agents.
β92Updated last week
Alternatives and similar repositories for AgentFly
Users that are interested in AgentFly are comparing it to the libraries listed below
Sorting:
- π§Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learningβ286Updated 3 weeks ago
- β309Updated 5 months ago
- End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoningβ318Updated last month
- [NeurIPS 2025 Spotlight] ReasonFlux Series - ReasonFlux, ReasonFlux-PRM and ReasonFlux-Coderβ496Updated last month
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learningβ258Updated 6 months ago
- β231Updated 3 months ago
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models (NeurIPS 2025)β166Updated last week
- MiroMind-M1 is a fully open-source series of reasoning language models built on Qwen-2.5, focused on advancing mathematical reasoning.β240Updated 3 months ago
- Towards a Unified View of Large Language Model Post-Trainingβ183Updated 2 months ago
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replayβ135Updated 5 months ago
- A version of verl to support diverse tool useβ668Updated last week
- Official Repository of "Learning to Reason under Off-Policy Guidance"β364Updated last month
- The official code of ARPO & AEPOβ778Updated last week
- Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL.β485Updated 2 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasksβ249Updated 6 months ago
- [ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesisβ167Updated last month
- MiroRL is an MCP-first reinforcement learning framework for deep research agent.β170Updated 2 months ago
- An Open-Source Large-Scale Reinforcement Learning Project for Search Agentsβ490Updated last month
- β378Updated 3 weeks ago
- Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.β160Updated last month
- MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search tooβ¦β347Updated 2 months ago
- Training VLM agents with multi-turn reinforcement learningβ304Updated this week
- MemGen: Weaving Generative Latent Memory for Self-Evolving Agentsβ161Updated 2 weeks ago
- [ICLR 2025] Benchmarking Agentic Workflow Generationβ132Updated 8 months ago
- Implementation for OAgents: An Empirical Study of Building Effective Agentsβ280Updated last month
- β212Updated 8 months ago
- β336Updated 3 months ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoningβ190Updated 7 months ago
- [NeurIPS 2025] The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyondβ180Updated 4 months ago
- repo for paper https://arxiv.org/abs/2504.13837β217Updated 4 months ago