SWE-bench / SWE-smithLinks
Scaling Data for SWE-agents
☆256Updated this week
Alternatives and similar repositories for SWE-smith
Users that are interested in SWE-smith are comparing it to the libraries listed below
Sorting:
- Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents☆76Updated 2 weeks ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆486Updated last month
- A benchmark for LLMs on complicated tasks in the terminal☆177Updated this week
- r2e: turn any github repository into a programming agent environment☆125Updated 2 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆173Updated 3 months ago
- AWM: Agent Workflow Memory☆275Updated 4 months ago
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆183Updated this week
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆224Updated last week
- SWE Arena☆34Updated 2 months ago
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆212Updated last month
- Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆545Updated 3 months ago
- ☆86Updated 2 weeks ago
- RepoQA: Evaluating Long-Context Code Understanding☆109Updated 7 months ago
- ☆158Updated 9 months ago
- Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement☆100Updated 4 months ago
- ☆41Updated 4 months ago
- Enhancing AI Software Engineering with Repository-level Code Graph☆184Updated 2 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆219Updated last month
- ☆97Updated 11 months ago
- ☆207Updated last month
- ☆97Updated last month
- 🚀 SWE-bench Goes Live!☆65Updated last week
- ☆119Updated last month
- A benchmark that challenges language models to code solutions for scientific problems☆124Updated 2 weeks ago
- Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving☆192Updated this week
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆220Updated last year
- [ACL25' Findings] SWE-Dev is an SWE agent with a scalable test case construction pipeline.☆40Updated last week
- Scaling Computer-Use Grounding via UI Decomposition and Synthesis☆79Updated this week
- A simple unified framework for evaluating LLMs☆217Updated 2 months ago
- CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings☆41Updated 4 months ago