Aider-AI / aider-swe-bench
Harness used to benchmark aider against SWE Bench benchmarks
☆52Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for aider-swe-bench
- Enhancing AI Software Engineering with Repository-level Code Graph☆92Updated 2 months ago
- ☆80Updated 3 months ago
- Aider's refactoring benchmark exercises based on popular python repos☆44Updated last month
- ☆31Updated 2 weeks ago
- ☆255Updated last month
- ☆152Updated 2 months ago
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆99Updated this week
- Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions☆40Updated 3 months ago
- RepoQA: Evaluating Long-Context Code Understanding☆99Updated last week
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆109Updated 4 months ago
- Enhanced fork of SWE-bench, tailored for OpenDevin's ecosystem.☆20Updated 5 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆75Updated 3 weeks ago
- Graph-based method for end-to-end code completion with context awareness on repository☆44Updated 2 months ago
- r2e: turn any github repository into a programming agent environment☆88Updated last week
- Codebase accompanying the Summary of a Haystack paper.☆72Updated last month
- Contains the prompts we use to talk to various LLMs for different utilities inside the editor☆61Updated 9 months ago
- ☆50Updated 4 months ago
- ☆127Updated 3 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆119Updated 3 weeks ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆206Updated last month
- Data preparation code for CrystalCoder 7B LLM☆42Updated 6 months ago
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆52Updated last month
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆194Updated 6 months ago
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".☆64Updated 4 months ago
- CodeRAG-Bench: Can Retrieval Augment Code Generation?☆78Updated 4 months ago
- ✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024☆133Updated 2 months ago
- ☆40Updated this week
- Source code for paper: INTERVENOR : Prompt the Coding Ability of Large Language Models with the Interactive Chain of Repairing☆24Updated 4 months ago
- Synthetic Data for LLM Fine-Tuning☆93Updated 11 months ago
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆47Updated 3 weeks ago