stanford-iris-lab/meta-harness-tbench2-artifact

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/stanford-iris-lab/meta-harness-tbench2-artifact)

stanford-iris-lab / meta-harness-tbench2-artifact

Meta-Harness: 76.4% on Terminal-Bench 2.0 (Claude Opus 4.6)

☆1,142

Alternatives and similar repositories for meta-harness-tbench2-artifact

Users that are interested in meta-harness-tbench2-artifact are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

stanford-iris-lab / meta-harness
View on GitHub
Reference code for the Meta-Harness paper.
☆1,271Updated this week
A-EVO-Lab / a-evolve
View on GitHub
The official repository of "Position: Agentic Evolution is the Path to Evolving LLMs".
☆690Jun 29, 2026Updated 2 weeks ago
facebookresearch / HyperAgents
View on GitHub
Self-referential self-improving agents that can optimize for any computable task
☆2,637May 9, 2026Updated 2 months ago
china-qijizhifeng / agentic-harness-engineering
View on GitHub
Official AHE code — Agentic Harness Engineering: observability-driven automatic evolution of coding-agent harnesses (concurrent w/ meta-h…
☆738Jun 14, 2026Updated last month
llm-as-a-verifier / llm-as-a-verifier
View on GitHub
☆523Jul 7, 2026Updated last week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Human-Agent-Society / CORAL
View on GitHub
CORAL is a robust, lightweight infrastructure for multi-agent autonomous self-evolution, built for autoresearch. Works with Claude Code, …
☆809Updated this week
NVIDIA-NeMo / ProRL-Agent-Server
View on GitHub
Agentic RL on Any Harness at Scale
☆648Updated this week
context-labs / HALO
View on GitHub
Hierarchal Agent Loop Optimizer
☆1,096Jul 6, 2026Updated last week
ByteDance-Seed / EdgeBench
View on GitHub
EdgeBench: Unveiling scaling laws of learning from real-world environments
☆343Updated this week
NousResearch / hermes-agent-self-evolution
View on GitHub
⚒ Evolutionary self-improvement for Hermes Agent — optimize skills, prompts, and code using DSPy + GEPA
☆4,675Jun 17, 2026Updated 3 weeks ago
aisa-group / InferenceBench
View on GitHub
Benchmarking Open-Ended Inference Optimization by AI Agents
☆32Jul 6, 2026Updated last week
howdymary / hermes-agent-metaharness
View on GitHub
An implementation of a Meta Harness for Hermes.
☆100Updated this week
Job-Bench / job-bench-eval
View on GitHub
Official eval scripts for JobBench
☆26Jul 5, 2026Updated last week
davebcn87 / pi-autoresearch
View on GitHub
Autonomous experiment loop extension for pi
☆7,196Updated this week
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
camel-ai / seta-env
View on GitHub
💻 SETA: Scaling Environments for Terminal Agents - Environments
☆141Feb 16, 2026Updated 4 months ago
evo-hq / evo
View on GitHub
turns your codebase into an autoresearch loop — discovers what to measure, instruments the benchmark, then runs tree search with parallel…
☆1,294Jul 1, 2026Updated last week
lithos-ai / motus
View on GitHub
The open-source agent-serving project
☆482Jun 8, 2026Updated last month
microsoft / SkillOpt
View on GitHub
SkillOpt is a text-space optimizer that trains reusable natural-language skills for frozen LLM agents through trajectory-driven edits, va…
☆12,686Updated this week
garrytan / gbrain
View on GitHub
Garry's Opinionated OpenClaw/Hermes Agent Brain
☆26,201Updated this week
NousResearch / hermes-agent
View on GitHub
The agent that grows with you
☆214,747Updated this week
sentient-agi / EvoSkill
View on GitHub
EvoSkill — An open-source framework that automatically discovers and synthesizes reusable agent skills from failed trajectories to improv…
☆1,035Jul 6, 2026Updated last week
caoshiyi / K-Search
View on GitHub
Automated High-Performance GPU Kernel Generation
☆120Jun 1, 2026Updated last month
NousResearch / autoreason
View on GitHub
Autoresearch for subjective domains.
☆590Apr 12, 2026Updated 3 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
princeton-pli / AggAgent
View on GitHub
☆28Apr 29, 2026Updated 2 months ago
autolabhq / autolab
View on GitHub
A benchmark for evaluating AI agents on frontier ultra long-horizon auto research tasks.
☆155Jun 17, 2026Updated 3 weeks ago
harbor-framework / terminal-bench
View on GitHub
A benchmark for LLMs on complicated tasks in the terminal
☆2,448Updated this week
wbopan / retro-harness
View on GitHub
RHO: Evolving Agents in the Dark — Retrospective Harness Optimization via Self-Preference. Improving LLM agents from unlabeled past traje…
☆41Jun 12, 2026Updated last month
zksha / alma
View on GitHub
ALMA (Automated meta-Learning of Memory designs for Agentic systems) is a framework that meta-learns memory designs to replace human-engi…
☆226Apr 8, 2026Updated 3 months ago
browser-use / browser-harness
View on GitHub
Browser Harness | Self-healing harness that enables LLMs to complete any task.
☆15,963Updated this week
PeterGriffinJin / Search-R1
View on GitHub
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
☆5,099Nov 13, 2025Updated 8 months ago
facebookresearch / ProgramBench
View on GitHub
Can Language Models Rebuild Programs From Scratch?
☆846Updated this week
aiming-lab / AutoResearchClaw
View on GitHub
Fully autonomous & self-evolving research from idea to paper. Chat an Idea. Get a Paper. 🦞
☆13,797Updated this week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
NVIDIA / OpenShell
View on GitHub
OpenShell is the safe, private runtime for autonomous AI agents.
☆7,695Updated this week
HKUDS / OpenSpace
View on GitHub
"OpenSpace: The Quality-First Skill Hub for AI Agents" -- https://open-space.cloud/
☆6,732Updated this week
strukto-ai / mirage
View on GitHub
The World's First Unified Virtual Filesystem For AI Agents
☆3,318Updated this week
Forward-Future / loopy
View on GitHub
A library of practical AI-agent loops and an installable skill for finding, adapting, and designing repeatable agent workflows.
☆2,680Jul 7, 2026Updated last week
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆22,469Updated this week
davidliuk / graph-of-skills
View on GitHub
Dependency-Aware Structural Retrieval for Massive Agent Skills
☆186May 4, 2026Updated 2 months ago
ultraworkers / claw-code
View on GitHub
An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.
☆194,754Jun 26, 2026Updated 2 weeks ago