Harbor is a framework for running agent evaluations and creating and using RL environments.
☆1,757May 3, 2026Updated this week
Alternatives and similar repositories for harbor
Users that are interested in harbor are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A benchmark for LLMs on complicated tasks in the terminal☆2,142Jan 22, 2026Updated 3 months ago
- Multi-agent synthetic data generation pipeline capable of generating and validating long horizon terminal/coding tasks for RL training☆62Jul 28, 2025Updated 9 months ago
- SkyRL: A Modular Full-stack RL Library for LLMs☆1,806Updated this week
- [COLING25] CodeJudge Eval: Can Large Language Models be Good Judges in Code Understanding?☆12Dec 3, 2024Updated last year
- Our library for RL environments + evals☆4,057Apr 30, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆35Apr 17, 2025Updated last year
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆639Apr 27, 2026Updated last week
- Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF☆25Oct 8, 2024Updated last year
- SkillsBench evaluates how well skills work and how effective agents are at using them☆1,079Apr 29, 2026Updated last week
- ☆270Apr 21, 2026Updated 2 weeks ago
- ☆63Feb 28, 2026Updated 2 months ago
- ☆38Mar 6, 2026Updated 2 months ago
- GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's T…☆378Aug 24, 2025Updated 8 months ago
- SWE-bench: Can Language Models Resolve Real-world Github Issues?☆4,831Apr 1, 2026Updated last month
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework☆21,046Updated this week
- Benchmarking Goal-Oriented Software Engineering☆149Jan 7, 2026Updated 3 months ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆672Jul 29, 2025Updated 9 months ago
- Evaluation utilities based on SymPy.☆22Dec 12, 2024Updated last year
- Codebase for EnterpriseOps-Gym from ServiceNow☆83Updated this week
- τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains☆1,103Apr 30, 2026Updated last week
- Democratizing Reinforcement Learning for LLMs☆5,462Updated this week
- A framework for few-shot evaluation of language models.☆12,411Updated this week
- ☆12Feb 11, 2026Updated 2 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- 🚧 Accepting Task Submissions 🚧☆149Updated this week
- Training a model similar to OpenAI DALL-E with volunteers from all over the Internet using hivemind and dalle-pytorch (NeurIPS 2021 demo)☆27May 29, 2023Updated 2 years ago
- ☆27Apr 7, 2026Updated last month
- ☆4,471Apr 22, 2026Updated 2 weeks ago
- Agentic RL Training at Scale☆1,338Updated this week
- A construction kit for reinforcement learning environment management.☆425Apr 29, 2026Updated last week
- An interface library for RL post training with environments.☆1,806Updated this week
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆857Jul 16, 2025Updated 9 months ago
- SGLang is a high-performance serving framework for large language models and multimodal models.☆26,832Updated this week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆394Apr 15, 2026Updated 3 weeks ago
- slime is an LLM post-training framework for RL Scaling.☆5,548Updated this week
- DSPy: The framework for programming—not prompting—language models☆34,180Updated this week
- Data recipes and robust infrastructure for training AI agents☆123Updated this week
- Agentless🐱: an agentless approach to automatically solve software development problems☆2,038Dec 22, 2024Updated last year
- 🌎💪 BrowserGym, a Gym environment for web task automation☆1,215Mar 17, 2026Updated last month
- nyc is so back☆21Jun 27, 2025Updated 10 months ago