Agent-One-Lab / AgentFlyLinks
Scalable and extensible reinforcement learning for LM agents.
β84Updated last week
Alternatives and similar repositories for AgentFly
Users that are interested in AgentFly are comparing it to the libraries listed below
Sorting:
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learningβ258Updated 5 months ago
- π§Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learningβ270Updated this week
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasksβ246Updated 5 months ago
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models (NeurIPS 2025)β154Updated last week
- β214Updated 2 months ago
- End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoningβ305Updated last month
- Test-time preferenece optimization (ICML 2025).β168Updated 5 months ago
- Code for the paper: "Learning to Reason without External Rewards"β364Updated 3 months ago
- Towards a Unified View of Large Language Model Post-Trainingβ163Updated last month
- π This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.β247Updated 3 weeks ago
- A version of verl to support diverse tool useβ607Updated this week
- [NeurIPS 2025 Spotlight] ReasonFlux Series - ReasonFlux, ReasonFlux-PRM and ReasonFlux-Coderβ492Updated 3 weeks ago
- β365Updated this week
- β300Updated 4 months ago
- MiroMind-M1 is a fully open-source series of reasoning language models built on Qwen-2.5, focused on advancing mathematical reasoning.β236Updated 2 months ago
- repo for paper https://arxiv.org/abs/2504.13837β200Updated 3 months ago
- The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyondβ170Updated 3 months ago
- β333Updated 2 months ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"β348Updated 2 weeks ago
- β211Updated 8 months ago
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replayβ130Updated 4 months ago
- [NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Exampleβ364Updated last week
- β228Updated this week
- Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL.β459Updated last month
- Generative AI Act II: Test Time Scaling Drives Cognition Engineeringβ207Updated 5 months ago
- A Framework for LLM-based Multi-Agent Reinforced Training and Inferenceβ301Updated last week
- β147Updated last week
- β323Updated 4 months ago
- β¨ Agentic Reinforced Policy Optimizationβ654Updated this week
- Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (β¦β369Updated this week