Danau5tin / terminal-bench-rlView external linksLinks
GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's TerminalBench leaderboard.
☆347Aug 24, 2025Updated 5 months ago
Alternatives and similar repositories for terminal-bench-rl
Users that are interested in terminal-bench-rl are comparing it to the libraries listed below
Sorting:
- Multi-agent synthetic data generation pipeline capable of generating and validating long horizon terminal/coding tasks for RL training☆51Jul 28, 2025Updated 6 months ago
- ☆33Jan 25, 2026Updated 3 weeks ago
- A dashboard for exploring timm learning rate schedulers☆19Nov 22, 2024Updated last year
- A benchmark for LLMs on complicated tasks in the terminal☆1,540Jan 22, 2026Updated 3 weeks ago
- ☆23Jun 7, 2023Updated 2 years ago
- Democratizing Reinforcement Learning for LLMs☆5,106Updated this week
- 🎉 TrustJudge is accepted to ICLR 2026!☆38Sep 27, 2025Updated 4 months ago
- [NeurIPS 2025@FoRLM] R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search☆17Jan 24, 2026Updated 3 weeks ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆47Apr 15, 2025Updated 10 months ago
- Our library for RL environments + evals☆3,833Updated this week
- FeatureBench: Benchmarking Agentic Coding for Complex Feature Development [ICLR 2026]☆18Updated this week
- Notebooks to demonstrate TimmWrapper☆16Jan 16, 2025Updated last year
- 🚀 Official code for “XStreamVGGT: Extremely Memory-Efficient Streaming Vision Geometry Grounded Transformer with KV Cache Compression”, …☆30Jan 27, 2026Updated 3 weeks ago
- [ICLR 2025] Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization☆12Jan 26, 2025Updated last year
- Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" given…☆15Oct 16, 2023Updated 2 years ago
- Pytorch script hot swap: Change code without unloading your LLM from VRAM☆125Apr 21, 2025Updated 9 months ago
- Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement…☆8,596Feb 10, 2026Updated last week
- Understanding R1-Zero-Like Training: A Critical Perspective☆1,209Aug 27, 2025Updated 5 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]☆218Nov 27, 2025Updated 2 months ago
- Adversaial attack comparative assessment Large Language Model☆13May 21, 2025Updated 8 months ago
- A lightweight, reproducible toolkit for LLM-based query reformulation.☆29Jan 3, 2026Updated last month
- ☆20Feb 13, 2025Updated last year
- A Gym for Agentic LLMs☆446Jan 21, 2026Updated 3 weeks ago
- Code associated with the paper: CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition.☆15May 16, 2025Updated 9 months ago
- ☆13May 7, 2024Updated last year
- Multi-Node Swarm on your laptop /w Docker-in-Docker -- Fun Stackfiles☆15Oct 9, 2017Updated 8 years ago
- What do we learn from inverting CLIP models?☆58Mar 6, 2024Updated last year
- Official repository for the paper "Gradient-based Jailbreak Images for Multimodal Fusion Models" (https//arxiv.org/abs/2410.03489)☆19Oct 22, 2024Updated last year
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …☆855Updated this week
- Framework for specifying and proving properties—such as robustness, fairness, and interpretability—of machine learning models using Lean …☆79Jul 30, 2025Updated 6 months ago
- Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrieval☆15Jan 16, 2024Updated 2 years ago
- OneEdit: A Neural-Symbolic Collaboratively Knowledge Editing System.☆19Oct 14, 2024Updated last year
- Deep grasp library for ROS2☆18Dec 17, 2023Updated 2 years ago
- [NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation☆69Jan 15, 2026Updated last month
- Official code for SplatSuRe: Selective Super-Resolution for Multi-view Consistent 3D Gaussian Splatting☆44Dec 3, 2025Updated 2 months ago
- Simple RL training for reasoning☆3,827Dec 23, 2025Updated last month
- a rust typescript integration☆64Jul 4, 2025Updated 7 months ago
- 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.☆628Jan 29, 2026Updated 2 weeks ago
- Using JAX to generate piano music as MIDI☆39Nov 28, 2023Updated 2 years ago