Aider-AI / aider-swe-benchLinks

Harness used to benchmark aider against SWE Bench benchmarks

☆72

Alternatives and similar repositories for aider-swe-bench

Users that are interested in aider-swe-bench are comparing it to the libraries listed below

Sorting:

Aider-AI / refactor-benchmark
Aider's refactoring benchmark exercises based on popular python repos
☆76Updated 9 months ago
aorwall / moatless-tree-search
☆99Updated 2 months ago
NL2Code / CodeR
☆159Updated 11 months ago
All-Hands-AI / openhands-aci
Agent computer interface for AI software engineer.
☆92Updated last week
SWE-agent / SWE-ReX
Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.
☆273Updated last week
aorwall / SWE-bench-docker
☆100Updated last year
CognitionAI / devin-swebench-results
Cognition's results and methodology on SWE-bench
☆121Updated last year
SWE-bench / sb-cli
Run SWE-bench evaluations remotely
☆34Updated last week
ScalingIntelligence / codemonkeys
☆41Updated 6 months ago
yueqis / API-Based-Agent
☆54Updated last month
InternLM / SWE-Fixer
☆108Updated 2 months ago
ozyyshr / RepoGraph
Enhancing AI Software Engineering with Repository-level Code Graph
☆197Updated 4 months ago
agential-ai / agential
🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!
☆52Updated 3 weeks ago
aymeric-roucher / agent_reasoning_benchmark
🔧 Compare how Agent systems perform on several benchmarks. 📊🚀
☆99Updated 9 months ago
SWE-bench / experiments
Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.
☆197Updated 3 weeks ago
letta-ai / sleep-time-compute
accompanying material for sleep-time compute paper
☆99Updated 3 months ago
All-Hands-AI / openhands-resolver
A system that tries to resolve all issues on a github repo with OpenHands.
☆110Updated 8 months ago
SWE-bench / SWE-smith
Scaling Data for SWE-agents
☆328Updated this week
Xalp / ECHO
Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)
☆91Updated 6 months ago
Aider-AI / polyglot-benchmark
Coding problems used in aider's polyglot benchmark
☆162Updated 7 months ago
teknium1 / LLM-Benchmark-Logs
Just a bunch of benchmark logs for different LLMs
☆119Updated last year
GoodAI / goodai-ltm-benchmark
A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…
☆76Updated 7 months ago
JoshuaPurtell / SmallBench
Small, simple agent task environments for training and evaluation
☆18Updated 9 months ago
aorwall / moatless-testbeds
Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…
☆14Updated 3 months ago
FSoft-AI4Code / RepoHyper
[FORGE 2025] Graph-based method for end-to-end code completion with context awareness on repository
☆64Updated 11 months ago
agiresearch / Formal-LLM
Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents
☆125Updated last year
commit-0 / commit0
Commit0: Library Generation from Scratch
☆160Updated 2 months ago
microsoft / CodePlan
Data and evaluation scripts for "CodePlan: Repository-level Coding using LLMs and Planning", FSE 2024
☆73Updated 11 months ago
h2oai / enterprise-h2ogpte
Client Code Examples, Use Cases and Benchmarks for Enterprise h2oGPTe RAG-Based GenAI Platform
☆87Updated last month
sorendunn / Agentless-Lite
Agentless Lite: RAG-based SWE-Bench software engineering scaffold
☆36Updated 3 months ago