SWE-bench: Can Language Models Resolve Real-world Github Issues?
☆4,437Feb 19, 2026Updated 3 weeks ago
Alternatives and similar repositories for SWE-bench
Users that are interested in SWE-bench are comparing it to the libraries listed below
Sorting:
- SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersec…☆18,655Mar 2, 2026Updated last week
- Agentless🐱: an agentless approach to automatically solve software development problems☆2,011Dec 22, 2024Updated last year
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆253Feb 27, 2026Updated 2 weeks ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆649Jul 29, 2025Updated 7 months ago
- A project structure aware autonomous software engineer aiming for autonomous program improvement. Resolved 37.3% tasks (pass@1) in SWE-be…☆3,061Apr 24, 2025Updated 10 months ago
- Code for the paper "Evaluating Large Language Models Trained on Code"☆3,154Jan 17, 2025Updated last year
- 🙌 OpenHands: AI-Driven Development☆68,865Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆72,827Updated this week
- SGLang is a high-performance serving framework for large language models and multimodal models.☆24,216Updated this week
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆589Updated this week
- Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024☆1,698Oct 2, 2025Updated 5 months ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆811Jul 16, 2025Updated 7 months ago
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆449Mar 2, 2026Updated last week
- DSPy: The framework for programming—not prompting—language models☆32,696Updated this week
- ☆4,390Jul 31, 2025Updated 7 months ago
- verl: Volcano Engine Reinforcement Learning for LLMs☆19,739Updated this week
- A framework for few-shot evaluation of language models.☆11,618Mar 5, 2026Updated last week
- Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing a…☆37,994Mar 6, 2026Updated last week
- This repo contains the dataset and code for the paper "SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software E…☆1,439Jul 18, 2025Updated 7 months ago
- A framework for the evaluation of autoregressive code generation language models.☆1,020Jul 22, 2025Updated 7 months ago
- A programming framework for agentic AI☆55,236Mar 5, 2026Updated last week
- ☆104Jul 17, 2024Updated last year
- ☆627Sep 1, 2025Updated 6 months ago
- aider is AI pair programming in your terminal☆41,613Mar 3, 2026Updated last week
- Train transformer language models with reinforcement learning.☆17,608Updated this week
- AllenAI's post-training codebase☆3,614Updated this week
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)☆3,211Feb 8, 2026Updated last month
- An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.☆39,418Jun 2, 2025Updated 9 months ago
- Tools for merging pretrained large language models.☆6,842Feb 28, 2026Updated last week
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)☆9,145Updated this week
- [ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI☆485Jan 3, 2026Updated 2 months ago
- [NeurIPS'25] Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆679Mar 16, 2025Updated 11 months ago
- Fast and memory-efficient exact attention☆22,719Updated this week
- LlamaIndex is the leading document agent and OCR platform☆47,608Updated this week
- Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.☆17,973Nov 3, 2025Updated 4 months ago
- Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)☆12,745Mar 3, 2026Updated last week
- Democratizing Reinforcement Learning for LLMs☆5,196Updated this week
- LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath☆9,478Jun 7, 2025Updated 9 months ago
- Letta is the platform for building stateful agents: AI with advanced memory that can learn and self-improve over time.☆21,456Mar 4, 2026Updated last week