Harbor is a framework for running agent evaluations and creating and using RL environments.
☆1,436Apr 13, 2026Updated this week
Alternatives and similar repositories for harbor
Users that are interested in harbor are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A benchmark for LLMs on complicated tasks in the terminal☆1,984Jan 22, 2026Updated 2 months ago
- Multi-agent synthetic data generation pipeline capable of generating and validating long horizon terminal/coding tasks for RL training☆59Jul 28, 2025Updated 8 months ago
- [COLING25] CodeJudge Eval: Can Large Language Models be Good Judges in Code Understanding?☆12Dec 3, 2024Updated last year
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆35Apr 17, 2025Updated 11 months ago
- Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF☆24Oct 8, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Our library for RL environments + evals☆3,986Apr 9, 2026Updated last week
- SkyRL: A Modular Full-stack RL Library for LLMs☆1,759Updated this week
- ☆256Apr 9, 2026Updated last week
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆621Updated this week
- SkillsBench evaluates how well skills work and how effective agents are at using them☆936Mar 27, 2026Updated 2 weeks ago
- ☆35Mar 6, 2026Updated last month
- GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's T…☆367Aug 24, 2025Updated 7 months ago
- Benchmarking Goal-Oriented Software Engineering☆134Jan 7, 2026Updated 3 months ago
- τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains☆1,003Updated this week
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Evaluation utilities based on SymPy.☆22Dec 12, 2024Updated last year
- SWE-bench: Can Language Models Resolve Real-world Github Issues?☆4,676Apr 1, 2026Updated 2 weeks ago
- Data recipes and robust infrastructure for training AI agents☆118Apr 9, 2026Updated last week
- ☆12Feb 11, 2026Updated 2 months ago
- Training a model similar to OpenAI DALL-E with volunteers from all over the Internet using hivemind and dalle-pytorch (NeurIPS 2021 demo)☆27May 29, 2023Updated 2 years ago
- ☆25Apr 7, 2026Updated last week
- Provider-agnostic, open-source evaluation infrastructure for language models☆759Mar 16, 2026Updated last month
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆662Jul 29, 2025Updated 8 months ago
- Agentic RL Training at Scale☆1,292Updated this week
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- nyc is so back☆21Jun 27, 2025Updated 9 months ago
- An interface library for RL post training with environments.☆1,599Apr 8, 2026Updated last week
- Run evals using LLM☆27Jan 8, 2026Updated 3 months ago
- Entropy Based Sampling and Parallel CoT Decoding☆17Oct 9, 2024Updated last year
- SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution☆103Sep 24, 2025Updated 6 months ago
- ☆119Apr 1, 2026Updated 2 weeks ago
- [ACL'25] UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench☆36Aug 12, 2025Updated 8 months ago
- verl: Volcano Engine Reinforcement Learning for LLMs☆20,603Updated this week
- 🌎💪 BrowserGym, a Gym environment for web task automation☆1,193Mar 17, 2026Updated 3 weeks ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Agentless🐱: an agentless approach to automatically solve software development problems☆2,036Dec 22, 2024Updated last year
- Code and Data for Tau-Bench☆1,178Mar 18, 2026Updated 3 weeks ago
- Training Models Daily☆16Dec 19, 2023Updated 2 years ago
- A curated list of products, benchmarks, and research papers on autonomous code agents. Beyond coding — they're redefining how software ch…☆94Apr 9, 2026Updated last week
- ☆4,436Jul 31, 2025Updated 8 months ago
- slime is an LLM post-training framework for RL Scaling.☆5,264Apr 9, 2026Updated last week
- Democratizing Reinforcement Learning for LLMs☆5,402Updated this week