microsoft/SWE-bench-Live

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/microsoft/SWE-bench-Live)

microsoft / SWE-bench-Live

[NeurIPS 2025 D&B] 🚀 SWE-bench Goes Live!

☆212

Alternatives and similar repositories for SWE-bench-Live

Users that are interested in SWE-bench-Live are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

microsoft / RepoLaunch
View on GitHub
Automate the build, execution and test of GitHub repositories across programming languages and operating systems.
☆127Jun 16, 2026Updated last month
SWE-bench / SWE-smith
View on GitHub
[NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents
☆710Updated this week
multi-swe-bench / multi-swe-bench
View on GitHub
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving
☆354Dec 18, 2025Updated 7 months ago
InternLM / SWE-Fixer
View on GitHub
☆139May 8, 2025Updated last year
R2E-Gym / R2E-Gym
View on GitHub
[COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents
☆309Jul 13, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
microsoft / FEA-Bench
View on GitHub
[ACL25] FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation
☆57Jan 28, 2026Updated 5 months ago
JetBrains-Research / EnvBench
View on GitHub
[DL4C @ ICLR 2025] A Benchmark for Automated Environment Setup
☆38Nov 9, 2025Updated 8 months ago
DeepSoftwareAnalytics / swe-factory
View on GitHub
[FSE'2026] SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks
☆183May 12, 2026Updated 2 months ago
SWE-Perf / SWE-Perf
View on GitHub
☆52Oct 28, 2025Updated 8 months ago
SWE-Gym / SWE-Gym
View on GitHub
Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]
☆709Jul 29, 2025Updated 11 months ago
OpenAgentEval / SWE-ABS
View on GitHub
[ICML 2026] SWE-ABS: Adversarial Benchmark Strengthening Exposes Inflated Success Rates on Test-based Benchmark
☆22May 6, 2026Updated 2 months ago
logic-star-ai / swt-bench
View on GitHub
[NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation
☆85Updated this week
SWE-Gym / SWE-Bench-Fork
View on GitHub
☆13Mar 5, 2025Updated last year
DeepSoftwareAnalytics / Awesome-Issue-Resolution
View on GitHub
Advances and Frontiers of LLM-based Issue Resolution in Software Engineering A Comprehensive Survey
☆85Apr 22, 2026Updated 3 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
CodeClash-ai / CodeClash
View on GitHub
Benchmarking Goal-Oriented Software Engineering
☆190Jul 16, 2026Updated last week
bytedance / Repo2Run
View on GitHub
Repo2Run is an LLM-based agent that automates environment configuration by generating error-free Dockerfiles for Python repositories.
☆195Jun 10, 2026Updated last month
SWE-bench / SWE-bench
View on GitHub
SWE-bench: Can Language Models Resolve Real-world Github Issues?
☆5,482Apr 1, 2026Updated 3 months ago
NoCode-bench / NoCode-bench
View on GitHub
☆21May 20, 2026Updated 2 months ago
CUHK-Shenzhen-SE / UTBoost
View on GitHub
[ACL'25] UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench
☆36Aug 12, 2025Updated 11 months ago
yingweima2022 / CodeLLM
View on GitHub
☆12Jan 31, 2024Updated 2 years ago
scaleapi / SWE-bench_Pro-os
View on GitHub
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?
☆487May 18, 2026Updated 2 months ago
phonism / CP-Zero
View on GitHub
Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.
☆18Apr 22, 2025Updated last year
facebookresearch / swe-rl
View on GitHub
[NeurIPS'25] Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"
☆712Mar 16, 2025Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
LiberCoders / FeatureBench
View on GitHub
[ICLR 2026] Official Implementation of "FeatureBench: Benchmarking Agentic Coding for Complex Feature Development"
☆83Jun 13, 2026Updated last month
TuringEnterprises / SWE-Bench-plus-plus
View on GitHub
SWE-Bench-plus-plus
☆25Feb 5, 2026Updated 5 months ago
waynchi / editbench
View on GitHub
☆31Apr 7, 2026Updated 3 months ago
kwaipilot / SWE-Compass
View on GitHub
☆18Mar 28, 2026Updated 3 months ago
commit-0 / commit0
View on GitHub
Commit0: Library Generation from Scratch
☆189Feb 24, 2026Updated 5 months ago
THUDM / SWE-Dev
View on GitHub
[ACL25' Findings] SWE-Dev is an SWE agent with a scalable test case construction pipeline.
☆64Jul 21, 2025Updated last year
SWE-agent / SWE-ReX
View on GitHub
Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.
☆555Updated this week
LiveCodeBench / LiveCodeBench
View on GitHub
Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"
☆913Jul 16, 2025Updated last year
harbor-framework / terminal-bench
View on GitHub
A benchmark for LLMs on complicated tasks in the terminal
☆2,482Jul 11, 2026Updated 2 weeks ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
lldong / tiny-trae
View on GitHub
A minimal AI coding agent powered by Anthropic's Claude. Interactive terminal interface with tool execution. Visit https://lldong.github.…
☆16Jul 15, 2025Updated last year
Danau5tin / tbench-agentic-data-pipeline
View on GitHub
Multi-agent synthetic data generation pipeline capable of generating and validating long horizon terminal/coding tasks for RL training
☆70Jul 28, 2025Updated 11 months ago
OpenAutoCoder / Agentless
View on GitHub
Agentless🐱: an agentless approach to automatically solve software development problems
☆2,085Dec 22, 2024Updated last year
tongye98 / Awesome-Code-Benchmark
View on GitHub
A comprehensive code domain benchmark review of LLM researches.
☆236Jun 25, 2026Updated 3 weeks ago
SkyworkAI / MindLink
View on GitHub
☆100Aug 8, 2025Updated 11 months ago
multi-swe-bench / MagentLess
View on GitHub
☆13Jul 31, 2025Updated 11 months ago
zsworld6 / projdevbench
View on GitHub
☆23May 7, 2026Updated 2 months ago