Agent-One-Lab / AgentFlyLinks
Scalable and extensible reinforcement learning for LM agents.
☆97Updated this week
Alternatives and similar repositories for AgentFly
Users that are interested in AgentFly are comparing it to the libraries listed below
Sorting:
- MemGen: Weaving Generative Latent Memory for Self-Evolving Agents☆263Updated last month
- Towards a Unified View of Large Language Model Post-Training☆199Updated 4 months ago
- End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning☆348Updated 3 months ago
- 🔧Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning☆305Updated last week
- ☆326Updated 7 months ago
- Implementation for OAgents: An Empirical Study of Building Effective Agents☆299Updated 2 months ago
- ☆255Updated 4 months ago
- [NeurIPS 2025 Spotlight] ReasonFlux (long-CoT), ReasonFlux-PRM (process reward model) and ReasonFlux-Coder (code generation)☆513Updated 3 months ago
- Generative AI Act II: Test Time Scaling Drives Cognition Engineering☆209Updated 8 months ago
- ☆404Updated 2 months ago
- [NeurIPS 2025] The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond☆187Updated 6 months ago
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay☆140Updated 7 months ago
- An Open-Source Large-Scale Reinforcement Learning Project for Search Agents☆529Updated last month
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆257Updated 7 months ago
- A version of verl to support diverse tool use☆805Updated this week
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆395Updated 3 months ago
- Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…☆464Updated last week
- MiroMind-M1 is a fully open-source series of reasoning language models built on Qwen-2.5, focused on advancing mathematical reasoning.☆247Updated 4 months ago
- ☆215Updated 10 months ago
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models (NeurIPS 2025)☆171Updated 2 months ago
- [ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis☆173Updated 3 months ago
- The official code of ARPO & AEPO☆843Updated last week
- Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL.☆520Updated 4 months ago
- MiroRL is an MCP-first reinforcement learning framework for deep research agent.☆186Updated 4 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆254Updated 8 months ago
- Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.☆163Updated 3 months ago
- ☆346Updated 5 months ago
- The official implementation of the paper "Mem-α: Learning Memory Construction via Reinforcement Learning"☆141Updated 2 weeks ago
- Open Source Implementation of Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evo…☆97Updated 5 months ago
- ☆480Updated 3 months ago