Aider-AI / aider-swe-bench
Harness used to benchmark aider against SWE Bench benchmarks
☆63Updated 7 months ago
Alternatives and similar repositories for aider-swe-bench:
Users that are interested in aider-swe-bench are comparing it to the libraries listed below
- ☆56Updated last week
- ☆82Updated 6 months ago
- Enhancing AI Software Engineering with Repository-level Code Graph☆127Updated 3 weeks ago
- ☆153Updated 5 months ago
- Aider's refactoring benchmark exercises based on popular python repos☆52Updated 3 months ago
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆136Updated last week
- RepoQA: Evaluating Long-Context Code Understanding☆104Updated 2 months ago
- Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions☆41Updated 5 months ago
- ☆69Updated 2 weeks ago
- ☆47Updated 2 months ago
- Sandboxed code execution for AI agents, locally or on the cloud.☆55Updated this week
- Just a bunch of benchmark logs for different LLMs☆117Updated 6 months ago
- ☆38Updated 6 months ago
- 🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!☆51Updated this week
- Data preparation code for CrystalCoder 7B LLM☆44Updated 8 months ago
- ☆336Updated this week
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆114Updated 7 months ago
- Graph-based method for end-to-end code completion with context awareness on repository☆56Updated 4 months ago
- ☆74Updated last year
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆55Updated 3 months ago
- Small, simple agent task environments for training and evaluation☆18Updated 2 months ago
- ☆74Updated last year
- Codebase accompanying the Summary of a Haystack paper.☆74Updated 4 months ago
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…☆22Updated last month
- Source code for paper: INTERVENOR : Prompt the Coding Ability of Large Language Models with the Interactive Chain of Repairing☆26Updated 2 months ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym☆251Updated 2 weeks ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆120Updated this week
- Score LLM pretraining data with classifiers☆54Updated last year
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆98Updated 4 months ago