Harbor is a framework for running agent evaluations and creating and using RL environments.
☆1,077Mar 23, 2026Updated this week
Alternatives and similar repositories for harbor
Users that are interested in harbor are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A benchmark for LLMs on complicated tasks in the terminal☆1,768Jan 22, 2026Updated 2 months ago
- ☆134Feb 27, 2026Updated last month
- [COLING25] CodeJudge Eval: Can Large Language Models be Good Judges in Code Understanding?☆12Dec 3, 2024Updated last year
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆35Apr 17, 2025Updated 11 months ago
- ☆247Mar 19, 2026Updated last week
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Our library for RL environments + evals☆3,918Mar 20, 2026Updated last week
- SkyRL: A Modular Full-stack RL Library for LLMs☆1,713Updated this week
- ☆33Mar 6, 2026Updated 3 weeks ago
- Open sourced backend for Martian's LLM Inference Provider Leaderboard☆21Aug 13, 2024Updated last year
- Multi-agent synthetic data generation pipeline capable of generating and validating long horizon terminal/coding tasks for RL training☆58Jul 28, 2025Updated 7 months ago
- Benchmarking Goal-Oriented Software Engineering☆123Jan 7, 2026Updated 2 months ago
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆602Updated this week
- Evaluation utilities based on SymPy.☆22Dec 12, 2024Updated last year
- Terminal-Bench-Science: Evaluating AI Agents on Complex Real-World Scientific Workflows in the Terminal☆43Mar 19, 2026Updated last week
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Data recipes and robust infrastructure for training AI agents☆111Mar 20, 2026Updated last week
- ☆12Feb 11, 2026Updated last month
- Training a model similar to OpenAI DALL-E with volunteers from all over the Internet using hivemind and dalle-pytorch (NeurIPS 2021 demo)☆27May 29, 2023Updated 2 years ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆652Jul 29, 2025Updated 7 months ago
- GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's T…☆358Aug 24, 2025Updated 7 months ago
- Async RL Training at Scale☆1,176Updated this week
- SWE-bench: Can Language Models Resolve Real-world Github Issues?☆4,527Mar 19, 2026Updated last week
- nyc is so back☆21Jun 27, 2025Updated 8 months ago
- Run evals using LLM☆27Jan 8, 2026Updated 2 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Entropy Based Sampling and Parallel CoT Decoding☆17Oct 9, 2024Updated last year
- ☆12May 30, 2025Updated 9 months ago
- [ACL'25] UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench☆36Aug 12, 2025Updated 7 months ago
- AAIF landscape☆37Jan 15, 2026Updated 2 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆359Mar 18, 2026Updated last week
- Training Models Daily☆16Dec 19, 2023Updated 2 years ago
- [ICLR 2026] The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution☆296Mar 19, 2026Updated last week
- An interface library for RL post training with environments.☆1,288Updated this week
- DS SERVE: The Largest Open Vector Store over Pretain Data; A Framework for Efficient and Scalable Neural Retrieval☆47Jan 28, 2026Updated last month
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Training GPTs to solve interaction nets☆18Aug 14, 2024Updated last year
- A Gym for Agentic LLMs☆467Jan 21, 2026Updated 2 months ago
- moodist☆25Updated this week
- A construction kit for reinforcement learning environment management.☆395Updated this week
- Democratizing Reinforcement Learning for LLMs☆5,259Mar 19, 2026Updated last week
- A simple & powerful danmaku framework.☆14Mar 17, 2023Updated 3 years ago
- ☆23Jan 31, 2025Updated last year