harbor-framework/terminal-bench-3

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/harbor-framework/terminal-bench-3)

harbor-framework / terminal-bench-3

Measuring agents' ability to get work done on a computer

☆257

Alternatives and similar repositories for terminal-bench-3

Users that are interested in terminal-bench-3 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

bespokelabsai / verifiers
View on GitHub
Verifiers for LLM Reinforcement Learning
☆80Apr 15, 2025Updated last year
TransluceAI / introspective-interp
View on GitHub
Repository for "Training Language Models To Explain Their Own Computations"
☆22Dec 22, 2025Updated 6 months ago
SWE-bench / SWE-smith
View on GitHub
[NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents
☆688Updated this week
mlfoundations / scaling
View on GitHub
Language models scale reliably with over-training and on downstream tasks
☆101Apr 2, 2024Updated 2 years ago
HongtengXu / SGWB-Graphon
View on GitHub
Learning Graphons via Structured Gromov-Wasserstein Barycenters
☆23Dec 13, 2020Updated 5 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
McGill-NLP / CHASE
View on GitHub
Synthetic Data Generation for Evaluation
☆15Feb 21, 2025Updated last year
francescortu / comp-mech
View on GitHub
Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals; ACL 2024
☆13May 24, 2024Updated 2 years ago
SWE-EVO / SWE-EVO
View on GitHub
☆48May 3, 2026Updated 2 months ago
fraenkel-lab / GSLR
View on GitHub
An algorithm for classification from a graph-sparse support
☆15Jan 30, 2019Updated 7 years ago
modestyachts / cifar-10.2
View on GitHub
Host CIFAR-10.2 Data Set
☆13Sep 22, 2021Updated 4 years ago
DavidHerel / semantics-preserving-encoder
View on GitHub
Python library providing a simple, fully supervised sentence embedding technique for textual adversarial attacks.
☆13Dec 13, 2023Updated 2 years ago
allenai / olmix
View on GitHub
☆41May 26, 2026Updated last month
utcsilab / ambient-diffusion-mri
View on GitHub
[ICLR 2025] Official Implementation: "Ambient Diffusion Posterior Sampling: Solving Inverse Problems with Diffusion Models trained on Cor…
☆25Feb 27, 2025Updated last year
wxjiao / Pre-CODE
View on GitHub
Implementation of our paper "Exploiting Unsupervised Data for Emotion Recognition in Conversations" in the Findings of EMNLP-2020.
☆13Nov 17, 2020Updated 5 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
MJ-Jang / BECEL
View on GitHub
☆10Jan 28, 2024Updated 2 years ago
zhaowei-wang-nlp / DivScene
View on GitHub
The code of the paper "DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects"
☆19May 2, 2025Updated last year
hongyurain / DRAGON
View on GitHub
☆13Jul 28, 2023Updated 2 years ago
whoami-xu / srun_ict_2022
View on GitHub
登录脚本
☆12Nov 4, 2022Updated 3 years ago
aryopg / decore
View on GitHub
Official Implementation of "DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucination"
☆30Dec 18, 2024Updated last year
yuzhaouoe / pretraining-data-packing
View on GitHub
[ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training
☆23Aug 18, 2024Updated last year
alessiodevoto / l2compress
View on GitHub
Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."
☆18Dec 13, 2024Updated last year
PKU-ML / Message-Passing-Contrastive-Learning
View on GitHub
Official Code for ICLR 2023 Paper: A Message Passing Perspective on Learning Dynamics of Contrastive Learning
☆11Mar 9, 2023Updated 3 years ago
RoyalSkye / ATCL
View on GitHub
[NeurIPS 2022] "Adversarial Training with Complementary Labels: On the Benefit of Gradually Informative Attacks"
☆13Nov 11, 2022Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
SalesforceAIResearch / PretrainRL-pipeline
View on GitHub
An automated data pipeline scaling RL to pretraining levels
☆76Jun 2, 2026Updated last month
LingxiaoShawn / GLOD-Issues
View on GitHub
Source code and additional results for GLOD issues
☆12Jan 19, 2023Updated 3 years ago
Mingrui-Li / Qwen-VL-Lora-Model
View on GitHub
可以成功Lora微调的Qwen-VL模型
☆16Oct 27, 2023Updated 2 years ago
UKPLab / acl2024-ircoder
View on GitHub
Data creation, training and eval scripts for the IRCoder paper
☆21May 31, 2024Updated 2 years ago
sail-sg / SkyLadder
View on GitHub
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆43Dec 29, 2025Updated 6 months ago
discus0434 / evaluate-images-to-feed-diffusion
View on GitHub
Small notebook to preprocess and evaluate images.
☆14Nov 11, 2022Updated 3 years ago
HJSang / OPSD_OnPolicyDistillation
View on GitHub
On Policy Distillation Build on top of Verl
☆89May 25, 2026Updated last month
NorthwaveSecurity / linkedin-crawler
View on GitHub
Obtain emails using the LinkedIn Graph API
☆11Oct 1, 2025Updated 9 months ago
zhangyifei01 / SEED_ICLR21
View on GitHub
SEED: Self-supervised Distillation for Visual Representation
☆16Jul 20, 2022Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Niuchx / HimNet
View on GitHub
Code for Graph-level Anomaly Detection via Hierarchical Memory Networks (HimNet)
☆18Oct 6, 2023Updated 2 years ago
InfiMM / mllm-hd
View on GitHub
Official code for infimm-hd
☆16Sep 4, 2024Updated last year
aryopg / mmlu-redux
View on GitHub
☆31Nov 9, 2024Updated last year
isaiah-harville / NIDS
View on GitHub
Real-Time Network Intrusion Detection Framework
☆14Mar 21, 2025Updated last year
maumueller / ann-benchmarks
View on GitHub
Benchmarking approximate nearest neighbors. Note: This is an archived version from our SISAP 2017 paper, see below.
☆28May 3, 2018Updated 8 years ago
MathGenie / MathGenie
View on GitHub
☆14Mar 11, 2024Updated 2 years ago
H-Freax / Awesome-Graph-RAG
View on GitHub
This repository compiles a list of papers/resources related to the graph retrieval-augmented generation! Star⭐ the repo and follow me if …
☆10Dec 7, 2024Updated last year