Danau5tin/terminal-bench-rl

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Danau5tin/terminal-bench-rl)

Danau5tin / terminal-bench-rl

GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's TerminalBench leaderboard.

☆399

Alternatives and similar repositories for terminal-bench-rl

Users that are interested in terminal-bench-rl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Danau5tin / tbench-agentic-data-pipeline
View on GitHub
Multi-agent synthetic data generation pipeline capable of generating and validating long horizon terminal/coding tasks for RL training
☆71Jul 28, 2025Updated 11 months ago
camel-ai / seta
View on GitHub
💻 SETA: Scaling Environments for Terminal Agents
☆127Jul 17, 2026Updated last week
harbor-framework / terminal-bench
View on GitHub
A benchmark for LLMs on complicated tasks in the terminal
☆2,483Jul 11, 2026Updated 2 weeks ago
camel-ai / seta-env
View on GitHub
💻 SETA: Scaling Environments for Terminal Agents - Environments
☆143Feb 16, 2026Updated 5 months ago
rllm-org / rllm
View on GitHub
Democratizing Reinforcement Learning for LLMs
☆5,731Updated this week
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
open-thoughts / OpenThoughts-Agent
View on GitHub
Data recipes and robust infrastructure for training AI agents
☆265Updated this week
SWE-Gym / SWE-Gym
View on GitHub
Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]
☆711Jul 29, 2025Updated 11 months ago
kanishkg / endless-terminals
View on GitHub
☆134Mar 31, 2026Updated 3 months ago
NovaSky-AI / SkyRL
View on GitHub
SkyRL: A Modular Full-stack RL Library for LLMs
☆2,093Updated this week
axon-rl / gem
View on GitHub
A Gym for Agentic LLMs
☆502Jan 21, 2026Updated 6 months ago
abundant-ai / SWE-gen
View on GitHub
Convert GitHub PRs into Harbor tasks
☆72Jul 13, 2026Updated last week
SWE-bench / SWE-smith
View on GitHub
[NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents
☆711Updated this week
harbor-framework / harbor
View on GitHub
Framework for evaluating and improving agents
☆3,504Updated this week
MiniMax-AI / mini-vela
View on GitHub
☆37Apr 2, 2026Updated 3 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
R2E-Gym / R2E-Gym
View on GitHub
[COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents
☆310Jul 13, 2025Updated last year
THUDM / slime
View on GitHub
slime is an LLM post-training framework for RL Scaling.
☆7,629Updated this week
PrimeIntellect-ai / verifiers
View on GitHub
Our library for RL environments + evals
☆4,400Updated this week
sail-sg / understand-r1-zero
View on GitHub
Understanding R1-Zero-Like Training: A Critical Perspective
☆1,268Aug 27, 2025Updated 10 months ago
hkust-nlp / Toolathlon
View on GitHub
[ICLR 2026] The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
☆440Updated this week
PrimeIntellect-ai / prime-rl
View on GitHub
Agentic RL Training at Scale
☆1,724Updated this week
alibaba / terminal-bench-pro
View on GitHub
☆119Apr 1, 2026Updated 3 months ago
ltzheng / SimpleTIR
View on GitHub
[ICLR 2026] End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆401Mar 30, 2026Updated 3 months ago
sail-sg / tty-use
View on GitHub
☆15Oct 13, 2025Updated 9 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
LeiLiLab / HardTestGen
View on GitHub
☆17Jan 27, 2026Updated 5 months ago
facebookresearch / swe-rl
View on GitHub
[NeurIPS'25] Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"
☆712Mar 16, 2025Updated last year
harbor-framework / terminal-bench-2
View on GitHub
☆344Apr 30, 2026Updated 2 months ago
JierunChen / SFT-RL-SynergyDilemma
View on GitHub
☆15Jan 14, 2026Updated 6 months ago
areal-project / AReaL
View on GitHub
The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.
☆5,599Updated this week
radixark / miles
View on GitHub
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
☆1,789Updated this week
SWE-agent / SWE-ReX
View on GitHub
Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.
☆555Updated this week
OpenRLHF / OpenRLHF
View on GitHub
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…
☆9,848Jul 14, 2026Updated last week
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆22,654Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
TIGER-AI-Lab / verl-tool
View on GitHub
A version of verl to support diverse tool use [TMLR 2026]
☆1,024Jul 15, 2026Updated last week
open-thought / reasoning-gym
View on GitHub
[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards
☆1,468Apr 17, 2026Updated 3 months ago
harbor-framework / harbor-datasets
View on GitHub
☆36May 16, 2026Updated 2 months ago
nilsleut / Biologically-Plausible-RL-Plays-Pong
View on GitHub
A technical exploration comparing standard deep RL (PPO) against biologically plausible learning rules on a custom Pong environment. Ever…
☆29May 19, 2026Updated 2 months ago
lblankl / SWE-MiniSandbox
View on GitHub
Container-free RL framework for training software engineering agents
☆72Jun 24, 2026Updated last month
Open-Reasoner-Zero / Open-Reasoner-Zero
View on GitHub
Official Repo for Open-Reasoner-Zero
☆2,096Jun 2, 2025Updated last year
multimodal-art-projection / NL2RepoBench
View on GitHub
☆144May 13, 2026Updated 2 months ago