Aider-AI / aider-swe-bench
Harness used to benchmark aider against SWE Bench benchmarks
☆66Updated 8 months ago
Alternatives and similar repositories for aider-swe-bench:
Users that are interested in aider-swe-bench are comparing it to the libraries listed below
- ☆68Updated last month
- ☆85Updated 7 months ago
- Aider's refactoring benchmark exercises based on popular python repos☆61Updated 4 months ago
- ☆155Updated 6 months ago
- Enhancing AI Software Engineering with Repository-level Code Graph☆136Updated last month
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆101Updated this week
- ☆50Updated 3 months ago
- Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions☆41Updated 6 months ago
- [FORGE 2025] Graph-based method for end-to-end code completion with context awareness on repository☆57Updated 6 months ago
- RepoQA: Evaluating Long-Context Code Understanding☆103Updated 4 months ago
- Client Code Examples, Use Cases and Benchmarks for Enterprise h2oGPTe RAG-Based GenAI Platform☆83Updated 3 weeks ago
- A better way of testing, inspecting, and analyzing AI Agent traces.☆28Updated this week
- Just a bunch of benchmark logs for different LLMs☆119Updated 7 months ago
- ☆74Updated last year
- Agent computer interface for AI software engineer.☆38Updated this week
- Evaluating LLMs with CommonGen-Lite☆89Updated 11 months ago
- Enhanced fork of SWE-bench, tailored for OpenDevin's ecosystem.☆23Updated 9 months ago
- Codebase accompanying the Summary of a Haystack paper.☆75Updated 5 months ago
- Source code for paper: INTERVENOR : Prompt the Coding Ability of Large Language Models with the Interactive Chain of Repairing☆26Updated 3 months ago
- ☆39Updated 7 months ago
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆80Updated 4 months ago
- r2e: turn any github repository into a programming agent environment☆100Updated this week
- A system that tries to resolve all issues on a github repo with OpenHands.☆100Updated 3 months ago
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆150Updated this week
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆58Updated 5 months ago
- ☆48Updated 3 months ago
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".☆66Updated 8 months ago
- ☆76Updated last week