Harbor is a framework for running agent evaluations and creating and using RL environments.
☆2,105May 25, 2026Updated this week
Alternatives and similar repositories for harbor
Users that are interested in harbor are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A benchmark for LLMs on complicated tasks in the terminal☆2,264Jan 22, 2026Updated 4 months ago
- ☆247Apr 30, 2026Updated 3 weeks ago
- SkyRL: A Modular Full-stack RL Library for LLMs☆1,867May 18, 2026Updated last week
- Multi-agent synthetic data generation pipeline capable of generating and validating long horizon terminal/coding tasks for RL training☆63Jul 28, 2025Updated 9 months ago
- Our library for RL environments + evals☆4,125Updated this week
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [COLING25] CodeJudge Eval: Can Large Language Models be Good Judges in Code Understanding?☆12Dec 3, 2024Updated last year
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆656Updated this week
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆35Apr 17, 2025Updated last year
- Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF☆25Oct 8, 2024Updated last year
- Open schema + CLI for repo-local agent trace capture, review, and upload to Hugging Face Hub.☆75Updated this week
- SkillsBench evaluates how well skills work and how effective agents are at using them☆1,202Updated this week
- ☆285May 18, 2026Updated last week
- Probing task; contextual embeddings -> textual definitions (EMNLP19)☆11Apr 22, 2021Updated 5 years ago
- ☆68Feb 28, 2026Updated 2 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Open sourced backend for Martian's LLM Inference Provider Leaderboard☆21Aug 13, 2024Updated last year
- ☆41Mar 6, 2026Updated 2 months ago
- SWE-bench: Can Language Models Resolve Real-world Github Issues?☆5,006Apr 1, 2026Updated last month
- GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's T…☆385Aug 24, 2025Updated 9 months ago
- verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework☆21,514Updated this week
- A framework for few-shot evaluation of language models.☆12,678May 11, 2026Updated 2 weeks ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆679Jul 29, 2025Updated 9 months ago
- Democratizing Reinforcement Learning for LLMs☆5,548May 20, 2026Updated last week
- Evaluation utilities based on SymPy.☆22Dec 12, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Codebase for EnterpriseOps-Gym from ServiceNow☆91May 8, 2026Updated 2 weeks ago
- Agentic Research and Evaluation Suite☆94Apr 7, 2026Updated last month
- ☆4,492Apr 22, 2026Updated last month
- Benchmarking Goal-Oriented Software Engineering☆158May 5, 2026Updated 3 weeks ago
- Agentic RL Training at Scale☆1,384May 19, 2026Updated last week
- τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains☆1,226Updated this week
- slime is an LLM post-training framework for RL Scaling.☆5,774Updated this week
- SGLang is a high-performance serving framework for large language models and multimodal models.☆28,137Updated this week
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆873Jul 16, 2025Updated 10 months ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆12Feb 11, 2026Updated 3 months ago
- Training a model similar to OpenAI DALL-E with volunteers from all over the Internet using hivemind and dalle-pytorch (NeurIPS 2021 demo)☆27May 29, 2023Updated 2 years ago
- ☆27Apr 7, 2026Updated last month
- DSPy: The framework for programming—not prompting—language models☆34,631Updated this week
- A construction kit for reinforcement learning environment management.☆440Updated this week
- AllenAI's post-training codebase☆3,729Updated this week
- [COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents☆280Jul 13, 2025Updated 10 months ago