SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
☆177Sep 18, 2025Updated 5 months ago
Alternatives and similar repositories for spiral
Users that are interested in spiral are comparing it to the libraries listed below
Sorting:
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆62Oct 24, 2025Updated 4 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆354Feb 3, 2026Updated last month
- ☆25Aug 19, 2025Updated 6 months ago
- SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data☆21Jan 24, 2026Updated last month
- [ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆16Feb 9, 2026Updated last month
- Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"☆60Jan 5, 2026Updated 2 months ago
- [ICLR-2026] Official Implementation of our paper "THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning".☆32Feb 26, 2026Updated last week
- A Gym for Agentic LLMs☆455Jan 21, 2026Updated last month
- AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents☆37Oct 7, 2025Updated 5 months ago
- ☆20Apr 16, 2025Updated 10 months ago
- Code for paper "Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System"☆69Nov 14, 2024Updated last year
- MUA-RL: MULTI-TURN USER-INTERACTING AGENT REINFORCEMENT LEARNING FOR AGENTIC TOOL USE☆57Nov 5, 2025Updated 4 months ago
- Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".☆26Aug 9, 2025Updated 7 months ago
- ☆33Jul 15, 2025Updated 7 months ago
- a survey on deep research☆47Sep 9, 2025Updated 6 months ago
- ☆19Mar 10, 2025Updated 11 months ago
- ☆31Sep 12, 2025Updated 5 months ago
- ☆90Oct 30, 2025Updated 4 months ago
- ☆72Jun 10, 2025Updated 8 months ago
- Official code for paper "SPA-RL: Reinforcing LLM Agent via Stepwise Progress Attribution"☆70Sep 13, 2025Updated 5 months ago
- Dr. MAS is an end-to-end RL training framework for multi-agent LLM systems, supporting the co-training of multiple (heterogeneous) LLMs.☆109Feb 11, 2026Updated 3 weeks ago
- 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.☆637Jan 29, 2026Updated last month
- PyTorch Implementation for the paper "Let Me Help You! Neuro-Symbolic Short-Context Action Anticipation" accepted to RA-L'24.☆12Nov 27, 2024Updated last year
- The official repo for “Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem” [EMNLP25]☆34Sep 1, 2025Updated 6 months ago
- Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks☆37Nov 27, 2025Updated 3 months ago
- Accelerating RL for LLM Reasoning with Optimal Advantage Regression☆37May 30, 2025Updated 9 months ago
- TARS: MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs☆23Sep 21, 2025Updated 5 months ago
- [ACL 2025 Main] (🏆 Outstanding Paper Award) Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Proba…☆16Aug 15, 2025Updated 6 months ago
- ☆18May 3, 2025Updated 10 months ago
- ☆15Nov 18, 2025Updated 3 months ago
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 8 months ago
- ☆119Feb 25, 2026Updated last week
- ☆64Jan 12, 2026Updated last month
- Code for Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation (EVOL-RL).☆48Oct 16, 2025Updated 4 months ago
- ☆64Feb 4, 2026Updated last month
- [ICML'25] "Rethinking Addressing in Language Models via Contextualized Equivariant Positional Encoding" by Jiajun Zhu, Peihao Wang, Ruisi…☆14Jun 6, 2025Updated 9 months ago
- ☆27Jan 4, 2026Updated 2 months ago
- [AAAI 2026] ReCode: Reinforced Code Knowledge Editing for API Updates☆23Jul 1, 2025Updated 8 months ago
- Inverse Scaling in Test-Time Compute☆25Dec 3, 2025Updated 3 months ago