Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.
☆272Mar 29, 2026Updated 3 months ago
Alternatives and similar repositories for experiments
Users that are interested in experiments are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆106Jul 17, 2024Updated last year
- Run SWE-bench evaluations remotely☆73Aug 14, 2025Updated 10 months ago
- SWE-bench: Can Language Models Resolve Real-world Github Issues?☆5,273Apr 1, 2026Updated 3 months ago
- Agentless🐱: an agentless approach to automatically solve software development problems☆2,074Dec 22, 2024Updated last year
- Official implementation of paper How to Understand Whole Repository? New SOTA on SWE-bench Lite (21.3%)☆98Mar 26, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Harness used to benchmark aider against SWE Bench benchmarks☆85Jun 27, 2024Updated 2 years ago
- ☆36Jan 8, 2025Updated last year
- Enhanced fork of SWE-bench, tailored for OpenDevin's ecosystem.☆29May 26, 2024Updated 2 years ago
- Agentless Lite: RAG-based SWE-Bench software engineering scaffold☆49Apr 15, 2025Updated last year
- Enhancing AI Software Engineering with Repository-level Code Graph☆286Apr 1, 2025Updated last year
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆696Jul 29, 2025Updated 11 months ago
- ☆158Aug 27, 2024Updated last year
- Open-source repository for the OOPSLA'24 paper "CYCLE: Learning to Self-Refine Code Generation"☆10Mar 8, 2024Updated 2 years ago
- ☆13Mar 5, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Agent computer interface for AI software engineer.☆130Apr 16, 2026Updated 2 months ago
- ☆139Jun 6, 2025Updated last year
- Commit0: Library Generation from Scratch☆191Feb 24, 2026Updated 4 months ago
- ☆28Jun 2, 2026Updated 3 weeks ago
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆14Sep 4, 2024Updated last year
- ☆59Jan 28, 2025Updated last year
- Codev-Bench (Code Development Benchmark), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev…☆48Nov 6, 2024Updated last year
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆682Jun 22, 2026Updated last week
- A project structure aware autonomous software engineer aiming for autonomous program improvement. Resolved 37.3% tasks (pass@1) in SWE-be…☆3,091Apr 24, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- This repo contains the dataset and code for the paper "SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software E…☆1,438Jul 18, 2025Updated 11 months ago
- A multi-programming language benchmark for LLMs☆307Apr 12, 2026Updated 2 months ago
- Inference code of Lingma SWE-GPT☆260Dec 2, 2024Updated last year
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆539Jun 22, 2026Updated last week
- TDD-Bench-Verified is a new benchmark for generating test cases for test-driven development (TDD)☆31Jun 18, 2026Updated last week
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆49Dec 22, 2023Updated 2 years ago
- ☆48Jun 11, 2026Updated 2 weeks ago
- [LREC-Coling 2024] PECC: Problem Extraction and Coding Challenges☆14May 30, 2024Updated 2 years ago
- ☆139May 8, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation☆84Apr 28, 2026Updated 2 months ago
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…☆14Apr 9, 2025Updated last year
- ✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024☆208Aug 16, 2024Updated last year
- Contains the model patches and the eval logs from the passing swe-bench-lite run.☆10Jun 28, 2024Updated 2 years ago
- A package dedicated for running benchmark agreement testing☆19Sep 18, 2025Updated 9 months ago
- [NeurIPS'25] Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆704Mar 16, 2025Updated last year
- Voila! A smart automatic pet feeder using Arduino Uno + RTC time module for scheduling + multiple sensors.☆10Jun 4, 2024Updated 2 years ago