laude-institute / harbor
View external linksLinks

Harbor is a framework for running agent evaluations and creating and using RL environments.

☆600

Alternatives and similar repositories for harbor

Users that are interested in harbor are comparing it to the libraries listed below

Sorting:

laude-institute / terminal-bench
View on GitHub
A benchmark for LLMs on complicated tasks in the terminal
☆1,540Jan 22, 2026Updated 3 weeks ago
Algomancer / The-Daily-Train
View on GitHub
Training Models Daily
☆16Dec 19, 2023Updated 2 years ago
ZeroSumEval / ZeroSumEval
View on GitHub
A framework for pitting LLMs against each other in an evolving library of games ⚔
☆35Apr 17, 2025Updated 10 months ago
allenai / fluid-benchmarking
View on GitHub
Fluid Language Model Benchmarking
☆26Sep 16, 2025Updated 5 months ago
PrimeIntellect-ai / verifiers
View on GitHub
Our library for RL environments + evals
☆3,833Updated this week
reissbaker / clevergpt
View on GitHub
Training GPTs to solve interaction nets
☆18Aug 14, 2024Updated last year
aidanmclaughlin / AidanBench
View on GitHub
Aidan Bench attempts to measure <big_model_smell> in LLMs.
☆318Jun 26, 2025Updated 7 months ago
neulab / data-agora
View on GitHub
[ACL 2025 Main] Official Repository for "Evaluating Language Models as Synthetic Data Generators"
☆40Dec 13, 2024Updated last year
doomslide / autoloom
View on GitHub
Approximating the joint distribution of language models via MCTS
☆22Nov 3, 2024Updated last year
abundant-ai / SWE-gen
View on GitHub
Convert GitHub PRs into Harbor tasks
☆43Feb 7, 2026Updated last week
apple / ml-ogen
View on GitHub
☆13Apr 7, 2024Updated last year
centerforaisafety / textquests
View on GitHub
☆16Dec 2, 2025Updated 2 months ago
seratch / new-relic-dashboard-in-slack
View on GitHub
Tiny Bolt ⚡️ app demonstrating how to build Slack apps utilizing Slack's new features and New Relic APIs
☆13Nov 25, 2019Updated 6 years ago
nxexox / pykaniko
View on GitHub
Python client for Google Kaniko
☆11Jul 19, 2022Updated 3 years ago
InfrHQ / Replay
View on GitHub
An Infr app that helps you replay & talk to everything you've ever seen.
☆15Sep 19, 2023Updated 2 years ago
r-three / realistic_evaluation_of_model_merging_for_compositional_generalization
View on GitHub
☆12Updated this week
vast-ai / vast-sdk
View on GitHub
Vast.ai python sdk
☆19Feb 6, 2026Updated last week
JTWang2000 / FreeShap
View on GitHub
Fine-tuning-free Shapley value (FreeShap) for instance attribution
☆14May 29, 2024Updated last year
VectifyAI / Mafin2.5-FinanceBench
View on GitHub
📈 FinanceBench evaluation of Mafin 2.5 (Powered by PageIndex)
☆37Oct 20, 2025Updated 3 months ago
ridgesai / ridges-old
View on GitHub
☆12May 30, 2025Updated 8 months ago
s-smits / grpo-optuna
View on GitHub
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆59Oct 18, 2025Updated 3 months ago
NovaSky-AI / SkyRL
View on GitHub
SkyRL: A Modular Full-stack RL Library for LLMs
☆1,571Updated this week
haizelabs / nyc-ai-reading
View on GitHub
nyc is so back
☆20Jun 27, 2025Updated 7 months ago
angie-chen55 / pref-learning-ranking-acc
View on GitHub
☆13Jun 4, 2024Updated last year
facebookresearch / moodist
View on GitHub
moodist
☆24Jan 6, 2026Updated last month
amirrezasalimi / friday-agents
View on GitHub
Friday Agents. App: https://chat.toolstack.run/
☆14Dec 18, 2024Updated last year
maiush / OpenCharacterTraining
View on GitHub
Open Character Training
☆66Nov 24, 2025Updated 2 months ago
sotopia-lab / sotopia-rl
View on GitHub
Sotopia-RL: Reward Design for Social Intelligence
☆46Jan 29, 2026Updated 2 weeks ago
hkust-nlp / felm
View on GitHub
Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)
☆63Dec 25, 2023Updated 2 years ago
SWE-bench / SWE-bench
View on GitHub
SWE-bench: Can Language Models Resolve Real-world Github Issues?
☆4,267Feb 3, 2026Updated 2 weeks ago
N8python / binary-vectors-mlx
View on GitHub
MLX binary vectors and associated algorithms.
☆14Mar 13, 2025Updated 11 months ago
myracheng / elephant
View on GitHub
☆31Sep 28, 2025Updated 4 months ago
SWE-bench / sb-cli
View on GitHub
Run SWE-bench evaluations remotely
☆56Aug 14, 2025Updated 6 months ago
bloc97 / DeMo
View on GitHub
DeMo: Decoupled Momentum Optimization
☆198Dec 2, 2024Updated last year
Danau5tin / terminal-bench-rl
View on GitHub
GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's T…
☆347Aug 24, 2025Updated 5 months ago
open-thought / reasoning-gym
View on GitHub
[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards
☆1,343Jan 16, 2026Updated last month
hkust-nlp / Toolathlon
View on GitHub
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
☆219Feb 10, 2026Updated last week
PrimeIntellect-ai / prime-rl
View on GitHub
Async RL Training at Scale
☆1,071Updated this week
illidanlab / inversion-influence-function
View on GitHub
Official codes for "Understanding Deep Gradient Leakage via Inversion Influence Functions", NeurIPS 2023
☆16Oct 13, 2023Updated 2 years ago

laude-institute / harborView external linksLinks

Alternatives and similar repositories for harbor

laude-institute / harbor
View external linksLinks