laude-institute / harborView external linksLinks
Harbor is a framework for running agent evaluations and creating and using RL environments.
☆600Updated this week
Alternatives and similar repositories for harbor
Users that are interested in harbor are comparing it to the libraries listed below
Sorting:
- A benchmark for LLMs on complicated tasks in the terminal☆1,540Jan 22, 2026Updated 3 weeks ago
- Training Models Daily☆16Dec 19, 2023Updated 2 years ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆35Apr 17, 2025Updated 10 months ago
- Fluid Language Model Benchmarking☆26Sep 16, 2025Updated 5 months ago
- Our library for RL environments + evals☆3,833Updated this week
- Training GPTs to solve interaction nets☆18Aug 14, 2024Updated last year
- Aidan Bench attempts to measure <big_model_smell> in LLMs.☆318Jun 26, 2025Updated 7 months ago
- [ACL 2025 Main] Official Repository for "Evaluating Language Models as Synthetic Data Generators"☆40Dec 13, 2024Updated last year
- Approximating the joint distribution of language models via MCTS☆22Nov 3, 2024Updated last year
- Convert GitHub PRs into Harbor tasks☆43Feb 7, 2026Updated last week
- ☆13Apr 7, 2024Updated last year
- ☆16Dec 2, 2025Updated 2 months ago
- Tiny Bolt ⚡️ app demonstrating how to build Slack apps utilizing Slack's new features and New Relic APIs☆13Nov 25, 2019Updated 6 years ago
- Python client for Google Kaniko☆11Jul 19, 2022Updated 3 years ago
- An Infr app that helps you replay & talk to everything you've ever seen.☆15Sep 19, 2023Updated 2 years ago
- ☆12Updated this week
- Vast.ai python sdk☆19Feb 6, 2026Updated last week
- Fine-tuning-free Shapley value (FreeShap) for instance attribution☆14May 29, 2024Updated last year
- 📈 FinanceBench evaluation of Mafin 2.5 (Powered by PageIndex)☆37Oct 20, 2025Updated 3 months ago
- ☆12May 30, 2025Updated 8 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆59Oct 18, 2025Updated 3 months ago
- SkyRL: A Modular Full-stack RL Library for LLMs☆1,571Updated this week
- nyc is so back☆20Jun 27, 2025Updated 7 months ago
- ☆13Jun 4, 2024Updated last year
- moodist☆24Jan 6, 2026Updated last month
- Friday Agents. App: https://chat.toolstack.run/☆14Dec 18, 2024Updated last year
- Open Character Training☆66Nov 24, 2025Updated 2 months ago
- Sotopia-RL: Reward Design for Social Intelligence☆46Jan 29, 2026Updated 2 weeks ago
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆63Dec 25, 2023Updated 2 years ago
- SWE-bench: Can Language Models Resolve Real-world Github Issues?☆4,267Feb 3, 2026Updated 2 weeks ago
- MLX binary vectors and associated algorithms.☆14Mar 13, 2025Updated 11 months ago
- ☆31Sep 28, 2025Updated 4 months ago
- Run SWE-bench evaluations remotely☆56Aug 14, 2025Updated 6 months ago
- DeMo: Decoupled Momentum Optimization☆198Dec 2, 2024Updated last year
- GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's T…☆347Aug 24, 2025Updated 5 months ago
- [NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards☆1,343Jan 16, 2026Updated last month
- The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution☆219Feb 10, 2026Updated last week
- Async RL Training at Scale☆1,071Updated this week
- Official codes for "Understanding Deep Gradient Leakage via Inversion Influence Functions", NeurIPS 2023☆16Oct 13, 2023Updated 2 years ago