Meta-Harness: 76.4% on Terminal-Bench 2.0 (Claude Opus 4.6)
☆1,079Mar 26, 2026Updated 2 months ago
Alternatives and similar repositories for meta-harness-tbench2-artifact
Users that are interested in meta-harness-tbench2-artifact are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This repo explores how AMR to address tasks difficult for LLMs☆13Jan 15, 2024Updated 2 years ago
- Repository for the paper: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning☆18Feb 21, 2025Updated last year
- EMNLP 2022: "A Unified Positive-Unlabeled Learning Framework for Document-Level Relation Extraction with Different Levels of Labeling"☆27Feb 3, 2023Updated 3 years ago
- ☆13Feb 15, 2023Updated 3 years ago
- GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's T…☆391Aug 24, 2025Updated 9 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Ludic – an LLM-RL library for the era of experience☆63Jan 9, 2026Updated 5 months ago
- ☆32May 10, 2024Updated 2 years ago
- The official repo for the DanQing dataset.☆36Mar 25, 2026Updated 2 months ago
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆760May 10, 2026Updated last month
- Official implementation of "What does CLIP know about a red circle? Visual Prompt Engineering for VLMs", ICCV 2023☆12Sep 21, 2023Updated 2 years ago
- Terra SDKs for react native. Dependent of TerraiOS and TerraAndroid☆12Jun 4, 2026Updated last week
- NexAU (AU for Agent Universe), a general-purpose agent framework for building intelligent agents with tool capabilities.☆118May 25, 2026Updated 2 weeks ago
- Official codebase for "Analyzing the Generalization and Reliability of Steering Vectors"☆21Dec 14, 2024Updated last year
- OpenCE (Open Context Engineering): A community toolkit to implement, evaluate, and combine LLM context strategies (RAG, ACE, Compression)…☆332Nov 14, 2025Updated 6 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"☆23Mar 4, 2025Updated last year
- Stable Looped Models and their Scaling Laws☆161May 17, 2026Updated 3 weeks ago
- Pure Java Llama2 inference with optional multi-GPU CUDA implementation☆13Sep 2, 2023Updated 2 years ago
- ☆34Jul 8, 2025Updated 11 months ago
- Repository for "Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators"☆12Mar 25, 2025Updated last year
- An official codebase for "NormLens: Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Comm…☆10May 9, 2024Updated 2 years ago
- A series of high-performance GEMM (General Matrix Multiply) implementations Iteratively optimised for H100 GPUs in Pure CUDA.☆77Feb 18, 2026Updated 3 months ago
- ☆14Nov 19, 2024Updated last year
- ☆12Oct 25, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Reinforcing General Reasoning without Verifiers☆100Jun 24, 2025Updated 11 months ago
- Code for "Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning [EMNLP 2025 Findings]"☆18Aug 27, 2025Updated 9 months ago
- ☆25Mar 9, 2026Updated 3 months ago
- ☆55Apr 18, 2026Updated last month
- Dataset from Tip of the Tongue Known-Item Retrieval (2021) paper.☆12Nov 4, 2021Updated 4 years ago
- The code for paper "EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning"☆39Oct 1, 2025Updated 8 months ago
- [SIGIR '26] Mixture-of-Retrieval Experts for Reasoning-Guided Multimodal Knowledge Exploitation☆43May 15, 2026Updated 3 weeks ago
- Verifiers for LLM Reinforcement Learning☆80Apr 15, 2025Updated last year
- Dataset and baseline for Coling 2022 long paper (oral): "ConFiguRe: Exploring Discourse-level Chinese Figures of Speech"☆13Jul 27, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- A collection of scripts and tools for analyzing SWE agents.☆16May 7, 2025Updated last year
- Code repository for Practical XMPP, published by Packt☆10Jan 30, 2023Updated 3 years ago
- ☆16Jun 25, 2025Updated 11 months ago
- genAI agent providing security context, tooling for performing security analysis on CVE, components and more☆30Updated this week
- Source code of our paper "Focus on the Target’s Vocabulary: Masked Label Smoothing for Machine Translation" @ ACL 2022☆13Apr 13, 2022Updated 4 years ago
- [AAAI 2025 (Oral)] SAIL: Sample-Centric In-Context Learning for Document Information Extraction☆20Dec 24, 2024Updated last year
- ☆25Feb 6, 2023Updated 3 years ago