Aider-AI / aider-swe-bench
Harness used to benchmark aider against SWE Bench benchmarks
☆53Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for aider-swe-bench
- ☆81Updated 4 months ago
- Enhancing AI Software Engineering with Repository-level Code Graph☆94Updated 2 months ago
- ☆152Updated 2 months ago
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆100Updated this week
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆64Updated this week
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆110Updated 5 months ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆212Updated last month
- Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions☆40Updated 3 months ago
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆194Updated 6 months ago
- ☆37Updated 3 weeks ago
- RepoQA: Evaluating Long-Context Code Understanding☆100Updated 2 weeks ago
- Graph-based method for end-to-end code completion with context awareness on repository☆47Updated 2 months ago
- ☆264Updated this week
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆52Updated last month
- r2e: turn any github repository into a programming agent environment☆89Updated 3 weeks ago
- ☆35Updated last year
- ✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024☆133Updated 3 months ago
- ☆38Updated 4 months ago
- Data and evaluation scripts for "CodePlan: Repository-level Coding using LLMs and Planning", FSE 2024☆52Updated 2 months ago
- A hard gym for programming