☆109Feb 12, 2026Updated 3 weeks ago
Alternatives and similar repositories for terminal-bench-pro
Users that are interested in terminal-bench-pro are comparing it to the libraries listed below
Sorting:
- [ICLR 2026] Official Implementation of "FeatureBench: Benchmarking Agentic Coding for Complex Feature Development"☆25Updated this week
- BigOBench assesses the capacity of Large Language Models (LLMs) to comprehend time-space computational complexity of input or generated c…☆40Apr 15, 2025Updated 10 months ago
- A curated list of products, benchmarks, and research papers on autonomous code agents. Beyond coding — they're redefining how software ch…☆85Updated this week
- [NeurIPS 2025 D&B] 🚀 SWE-bench Goes Live!☆167Feb 25, 2026Updated last week
- A First Look at Conventional Commits Classification☆12Nov 18, 2024Updated last year
- ☆83Apr 18, 2024Updated last year
- Structured Chemistry Reasoning with Large Language Models☆39May 4, 2024Updated last year
- STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models☆36Feb 12, 2026Updated 3 weeks ago
- [ICLR 2024] This is the official implementation for the paper: "Beyond imitation: Leveraging fine-grained quality signals for alignment"☆10May 5, 2024Updated last year
- Terminal-Bench-Science: Evaluating AI Agents on Complex Real-World Scientific Workflows in the Terminal☆25Updated this week
- ☆14Mar 6, 2022Updated 4 years ago
- EANN(Pytorch)☆10Mar 12, 2022Updated 3 years ago
- ☆10Sep 18, 2021Updated 4 years ago
- Download Web-10K data by querying Bing Image Search☆10Feb 1, 2022Updated 4 years ago
- ☆11Mar 15, 2024Updated last year
- 北理工成绩更新查询&&计算加权☆11Jul 28, 2024Updated last year
- Code for "SCL-RAI: Span-based Contrastive Learning with Retrieval Augmented Inference for Unlabeled Entity Problem in NER" @COLING-2022☆11Aug 20, 2022Updated 3 years ago
- Repository of paper "Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis" (ACL 2025 Main)☆19Jul 19, 2025Updated 7 months ago
- Code of LeCoRE☆13Feb 15, 2023Updated 3 years ago
- English and Chinese LaTeX template for reports/projects/proposal at Beijing Institute of Technology☆10Nov 19, 2020Updated 5 years ago
- ☆18Jun 18, 2025Updated 8 months ago
- ☆10Apr 24, 2022Updated 3 years ago
- ☆12Mar 3, 2022Updated 4 years ago
- Python package for extractive NLP using the OpenAI API☆17Aug 28, 2024Updated last year
- ☆11Aug 10, 2022Updated 3 years ago
- A Datasette instance for searching WebVid-10M☆15Sep 30, 2022Updated 3 years ago
- [NeurIPS25] RULE: Reinforcement UnLEarning Achieves Forge-retain Pareto Optimality☆19Oct 22, 2025Updated 4 months ago
- ☆34Jan 25, 2026Updated last month
- ☆10Nov 14, 2021Updated 4 years ago
- ☆10Jun 21, 2021Updated 4 years ago
- Code for the "Long Context Needs Some R&R" paper.☆12Mar 11, 2024Updated last year
- ☆11Mar 13, 2023Updated 2 years ago
- CodexLeaks: Privacy Leaks from Code Generation Language Models in GitHub Copilot☆11Jul 11, 2023Updated 2 years ago
- WordPress plugin that provides a Gutenberg block for embedding the SemiAnalysis die yield calculator in posts and pages☆18Oct 10, 2025Updated 4 months ago
- AI for Mathematics Paper List☆17Jan 14, 2025Updated last year
- Python implementation of a Genetic Algorithm for the Resource-Constrained Project Scheduling Problem☆14May 29, 2023Updated 2 years ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Dec 19, 2024Updated last year
- ☆23Jan 27, 2026Updated last month
- Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.☆18Apr 22, 2025Updated 10 months ago