Measuring agents' ability to get work done on a computer
☆231Jun 10, 2026Updated this week
Alternatives and similar repositories for terminal-bench-3
Users that are interested in terminal-bench-3 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Convert GitHub PRs into Harbor tasks☆64Mar 10, 2026Updated 3 months ago
- A curated list of awesome Harbor ecosystem projects☆41May 29, 2026Updated 2 weeks ago
- ☆14Feb 12, 2024Updated 2 years ago
- Verifiers for LLM Reinforcement Learning☆80Apr 15, 2025Updated last year
- Repository for "Training Language Models To Explain Their Own Computations"☆22Dec 22, 2025Updated 5 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Language models scale reliably with over-training and on downstream tasks☆101Apr 2, 2024Updated 2 years ago
- TBD☆59Mar 13, 2026Updated 3 months ago
- Lightly-reviewed collection of community environments☆234Jun 5, 2026Updated last week
- Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals; ACL 2024☆13May 24, 2024Updated 2 years ago
- ☆46May 3, 2026Updated last month
- An algorithm for classification from a graph-sparse support☆15Jan 30, 2019Updated 7 years ago
- Host CIFAR-10.2 Data Set☆13Sep 22, 2021Updated 4 years ago
- Python library providing a simple, fully supervised sentence embedding technique for textual adversarial attacks.☆13Dec 13, 2023Updated 2 years ago
- ☆41May 26, 2026Updated 2 weeks ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Comprehensive LLM evaluation framework: GPQA Diamond to Chatbot Arena. Tests all major models equally, easily extensible.☆17Aug 22, 2024Updated last year
- [ECAI 2023] Official implementation of "FATRER: Full-Attention Topic Regularizer for Accurate and Robust Conversational Emotion Recogniti…☆13Oct 9, 2023Updated 2 years ago
- ☆10Jan 28, 2024Updated 2 years ago
- ☆11Oct 26, 2022Updated 3 years ago
- ☆11Jan 26, 2020Updated 6 years ago
- ☆13Jul 28, 2023Updated 2 years ago
- JAX implementation of Large Language Models. You can train GPT-2-like model with 青空文庫 (aozora bunko-clean dataset) or any other text dat…☆13Aug 5, 2024Updated last year
- 登录脚本☆12Nov 4, 2022Updated 3 years ago
- Official Implementation of "DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucination"☆30Dec 18, 2024Updated last year
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- JAX implementation of the Mistral 7b v0.1 model☆13Mar 27, 2024Updated 2 years ago
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆23Aug 18, 2024Updated last year
- This is a repository for code, data, and models associated with the paper LLM-RUBRIC: A Multidimensional, Calibrated Approach to Automate…☆33Mar 30, 2026Updated 2 months ago
- ☆24Feb 17, 2026Updated 3 months ago
- Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."☆18Dec 13, 2024Updated last year
- ☆18Feb 29, 2024Updated 2 years ago
- A blazing-fast PostgreSQL client built in Rust. No Electron. No JVM. No bloat.☆83May 23, 2026Updated 3 weeks ago
- [NeurIPS 2022] "Adversarial Training with Complementary Labels: On the Benefit of Gradually Informative Attacks"☆13Nov 11, 2022Updated 3 years ago
- An automated data pipeline scaling RL to pretraining levels☆77Jun 2, 2026Updated last week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- 可以成功Lora微调的Qwen-VL模型☆16Oct 27, 2023Updated 2 years ago
- Example MLOps using BentoML & mlFlow☆38May 9, 2021Updated 5 years ago
- Data creation, training and eval scripts for the IRCoder paper☆21May 31, 2024Updated 2 years ago
- ☆16Jun 12, 2024Updated 2 years ago
- ☆12Feb 16, 2024Updated 2 years ago
- ☆19Dec 26, 2022Updated 3 years ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆43Dec 29, 2025Updated 5 months ago