spiral-rl/spiral

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/spiral-rl/spiral)

spiral-rl / spiral

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

☆199

Alternatives and similar repositories for spiral

Users that are interested in spiral are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TextArena / TextArena
View on GitHub
A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning
☆411Updated this week
sail-sg / oat
View on GitHub
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
☆667Jan 29, 2026Updated 5 months ago
TextArena / UnstableBaselines
View on GitHub
☆120Apr 7, 2026Updated 3 months ago
thu-nics / MARSHAL
View on GitHub
[ICLR'26] MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs
☆54Apr 17, 2026Updated 3 months ago
spinbench / spinbench
View on GitHub
☆28May 30, 2026Updated last month
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
sotopia-lab / sotopia-rl
View on GitHub
Sotopia-RL: Reward Design for Social Intelligence
☆52Apr 1, 2026Updated 3 months ago
axon-rl / gem
View on GitHub
A Gym for Agentic LLMs
☆502Jan 21, 2026Updated 6 months ago
yunfeixie233 / ViGaL
View on GitHub
☆70Feb 4, 2026Updated 5 months ago
lmgame-org / GRL
View on GitHub
Multi-Turn RL Training System with AgentTrainer for Language Model Game Reinforcement Learning
☆65Dec 18, 2025Updated 7 months ago
sail-sg / VeriFree
View on GitHub
Reinforcing General Reasoning without Verifiers
☆102Jun 24, 2025Updated last year
sail-sg / Precision-RL
View on GitHub
Defeating the Training-Inference Mismatch via FP16
☆197Nov 14, 2025Updated 8 months ago
sunblaze-ucb / Intuitor
View on GitHub
[ICLR 2026] Learning to Reason without External Rewards
☆417Jan 26, 2026Updated 5 months ago
sail-sg / feedback-conditional-policy
View on GitHub
Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"
☆65Jan 5, 2026Updated 6 months ago
BytedTsinghua-SIA / Enigmata
View on GitHub
Resources for the Enigmata Project.
☆82Aug 13, 2025Updated 11 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
mll-lab-nu / RAGEN
View on GitHub
RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.
☆2,753Apr 14, 2026Updated 3 months ago
sail-sg / AnytimeReasoner
View on GitHub
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
☆54Jul 15, 2025Updated last year
multimodal-art-projection / KORGym
View on GitHub
☆60May 21, 2025Updated last year
sail-sg / tty-use
View on GitHub
☆15Oct 13, 2025Updated 9 months ago
MozerWang / AMPO
View on GitHub
[ICLR 2026] Adaptive Social Learning via Mode Policy Optimization for Language Agents
☆51Feb 2, 2026Updated 5 months ago
ZhaolinGao / REFUEL
View on GitHub
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
☆25Oct 8, 2024Updated last year
wangqinsi1 / Vision-Zero
View on GitHub
[ICLR 2026] Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play.
☆136Feb 6, 2026Updated 5 months ago
WentseChen / Verlog
View on GitHub
Verlog: A Multi-turn RL framework for LLM agents
☆73Apr 28, 2026Updated 2 months ago
open-thought / reasoning-gym
View on GitHub
[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards
☆1,463Apr 17, 2026Updated 3 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
JingMog / THOR
View on GitHub
[ICLR-2026] Official Implementation of our paper "THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning".
☆33Feb 26, 2026Updated 4 months ago
wantbook-book / SeRL
View on GitHub
SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data
☆24Jan 24, 2026Updated 5 months ago
TIGER-AI-Lab / One-Shot-CFT
View on GitHub
The official repo for “Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem” [EMNLP25]
☆33Sep 1, 2025Updated 10 months ago
TIGER-AI-Lab / verl-tool
View on GitHub
A version of verl to support diverse tool use [TMLR 2026]
☆1,021Updated this week
Chengsong-Huang / R-Zero
View on GitHub
[ICLR2026] codes for R-Zero: Self-Evolving Reasoning LLM from Zero Data (https://www.arxiv.org/pdf/2508.05004)
☆824Feb 4, 2026Updated 5 months ago
sail-sg / ActivePRM
View on GitHub
☆21Apr 16, 2025Updated last year
sotopia-lab / sotopia
View on GitHub
Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)
☆317Jun 5, 2026Updated last month
Trae1ounG / Pretrain_Space_RLVR
View on GitHub
[arxiv: 2604.14142] From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space
☆17Apr 16, 2026Updated 3 months ago
NovaSky-AI / SkyRL
View on GitHub
SkyRL: A Modular Full-stack RL Library for LLMs
☆2,085Updated this week
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
langfengQ / verl-agent
View on GitHub
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in…
☆2,140Jun 9, 2026Updated last month
TsinghuaC3I / MARTI
View on GitHub
A Framework for LLM-based Multi-Agent Reinforced Training and Inference
☆538Apr 14, 2026Updated 3 months ago
bigai-nlco / RuleReasoner
View on GitHub
[ICLR 2026] RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling
☆39Feb 25, 2026Updated 4 months ago
Interplay-LM-Reasoning / Interplay-LM-Reasoning
View on GitHub
[ICML 2026 Spotlight] On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
☆162Jun 8, 2026Updated last month
sail-sg / P-DoS
View on GitHub
[ArXiv 2025] Denial-of-Service Poisoning Attacks on Large Language Models
☆23Oct 22, 2024Updated last year
TEAM-ARM / arm
View on GitHub
[NeurIPS'25 Spotlight] ARM: Adaptive Reasoning Model
☆68Apr 6, 2026Updated 3 months ago
eddycmu / demystify-long-cot
View on GitHub
☆336May 31, 2025Updated last year