amazon-science / SWE-PolyBenchLinks
SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents
☆77Updated this week
Alternatives and similar repositories for SWE-PolyBench
Users that are interested in SWE-PolyBench are comparing it to the libraries listed below
Sorting:
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆246Updated last week
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆430Updated this week
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆538Updated this week
- Harbor is a framework for running agent evaluations and creating and using RL environments.☆542Updated this week
- ☆223Updated this week
- CodeSage: Code Representation Learning At Scale (ICLR 2024)☆116Updated last year
- Run SWE-bench evaluations remotely☆53Updated 5 months ago
- Public repository containing METR's DVC pipeline for eval data analysis☆199Updated last week
- [ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI☆477Updated last month
- ☆238Updated 2 months ago
- Harness used to benchmark aider against SWE Bench benchmarks☆79Updated last year
- ☆106Updated last year
- Inference-time scaling for LLMs-as-a-judge.☆328Updated 3 months ago
- ☆132Updated 8 months ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆625Updated 6 months ago
- τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment☆717Updated last week
- This repository contains the toolkit for replicating results from our technical report.☆200Updated 5 months ago
- ☆59Updated last year
- A system that tries to resolve all issues on a github repo with OpenHands.☆117Updated last year
- A benchmark for LLMs on complicated tasks in the terminal☆1,494Updated 2 weeks ago
- Lightly-reviewed collection of community environments☆210Updated last week
- Prompts used in the Automated Auditing Blog Post☆137Updated 6 months ago
- Agent computer interface for AI software engineer.☆116Updated 2 months ago
- Collection of evals for Inspect AI☆357Updated this week
- [NeurIPS'25] Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆675Updated 10 months ago
- Data recipes and robust infrastructure for training AI agents☆94Updated this week
- Live-SWE-agent: live, runtime self-evolving software engineering agent☆240Updated 3 weeks ago
- Tutorial for building LLM router☆244Updated last year
- Beating the GAIA benchmark with Transformers Agents. 🚀☆146Updated 11 months ago
- ☆76Updated 7 months ago