smallcloudai / refact-benchLinks
A benchmarking tool for evaluating AI coding assistants on real-world software engineering tasks from the SWE-Bench dataset.
☆57Updated 4 months ago
Alternatives and similar repositories for refact-bench
Users that are interested in refact-bench are comparing it to the libraries listed below
Sorting:
- SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?☆190Updated last week
- ☆117Updated 4 months ago
- Harness used to benchmark aider against SWE Bench benchmarks☆75Updated last year
- Run SWE-bench evaluations remotely☆41Updated 2 months ago
- ☆57Updated 8 months ago
- Coding problems used in aider's polyglot benchmark☆183Updated 9 months ago
- CodeSage: Code Representation Learning At Scale (ICLR 2024)☆113Updated 11 months ago
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.