logic-star-ai/swt-bench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/logic-star-ai/swt-bench)

logic-star-ai / swt-bench

[NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation

☆85

Alternatives and similar repositories for swt-bench

Users that are interested in swt-bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

logpai / hybridlogparser
View on GitHub
A toolkit for hybrid log parsing
☆18Aug 23, 2023Updated 2 years ago
aorwall / moatless-tree-search
View on GitHub
☆141Jun 6, 2025Updated last year
sorendunn / Agentless-Lite
View on GitHub
Agentless Lite: RAG-based SWE-Bench software engineering scaffold
☆49Apr 15, 2025Updated last year
thunlp / DebugBench
View on GitHub
The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".
☆87Jul 13, 2024Updated 2 years ago
IBM / TDD-Bench-Verified
View on GitHub
TDD-Bench-Verified is a new benchmark for generating test cases for test-driven development (TDD)
☆33Updated this week
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
yaof20 / ReaL
View on GitHub
Implementation and datasets for "Training Language Models to Generate Quality Code with Program Analysis Feedback"
☆42Jul 21, 2025Updated last year
CUHK-Shenzhen-SE / D4C
View on GitHub
[ICSE'25] Aligning the Objective of LLM-based Program Repair
☆24Mar 8, 2025Updated last year
SalesforceAIResearch / swecomm
View on GitHub
☆28Jun 2, 2026Updated last month
JUnitContest / JUGE
View on GitHub
junit tools contest infrastructure
☆13Feb 9, 2024Updated 2 years ago
amazon-science / SWE-PolyBench
View on GitHub
SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents
☆89Jul 10, 2026Updated 2 weeks ago
OpenAutoCoder / Agentless
View on GitHub
Agentless🐱: an agentless approach to automatically solve software development problems
☆2,085Dec 22, 2024Updated last year
yiqingxyq / RepoST
View on GitHub
Code for "[COLM'25] RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing"
☆24Mar 18, 2025Updated last year
SWE-bench / SWE-smith
View on GitHub
[NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents
☆711Updated this week
finyorko / longcli-bench
View on GitHub
LongCLI-Bench's official repository
☆44May 25, 2026Updated 2 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
aorwall / moatless-tools
View on GitHub
☆641Sep 1, 2025Updated 10 months ago
ScalingIntelligence / codemonkeys
View on GitHub
☆59Jan 28, 2025Updated last year
zjulgc / llmpeft4apr
View on GitHub
☆16Nov 9, 2024Updated last year
NJU-iSE / FUEL
View on GitHub
This repo is the artifact of FUEL
☆17May 19, 2026Updated 2 months ago
kupl / npex
View on GitHub
☆15Oct 11, 2023Updated 2 years ago
LiberCoders / FeatureBench
View on GitHub
[ICLR 2026] Official Implementation of "FeatureBench: Benchmarking Agentic Coding for Complex Feature Development"
☆83Jun 13, 2026Updated last month
logpai / AutoLog
View on GitHub
AutoLog: A Log Sequence Synthesis Framework for Anomaly Detection [ASE'23]
☆41Feb 20, 2024Updated 2 years ago
RaoNikitha / CAT-LM
View on GitHub
☆15Feb 28, 2024Updated 2 years ago
OpenAgentEval / SWE-ABS
View on GitHub
[ICML 2026] SWE-ABS: Adversarial Benchmark Strengthening Exposes Inflated Success Rates on Test-based Benchmark
☆22May 6, 2026Updated 2 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
zysszy / CAT
View on GitHub
Improving Machine Translation Systems via Isotopic Replacement
☆12Apr 14, 2023Updated 3 years ago
stanfordnlp / multi-distribution-retrieval
View on GitHub
Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrieval
☆17Jan 16, 2024Updated 2 years ago
congy / AutoSuggest
View on GitHub
☆14Mar 13, 2021Updated 5 years ago
evo-eval / evoeval
View on GitHub
EvoEval: Evolving Coding Benchmarks via LLM
☆84Apr 6, 2024Updated 2 years ago
CLUEbenchmark / KGQA
View on GitHub
Knowledge Graph based Question Answering benchmark.
☆10Feb 1, 2020Updated 6 years ago
EhsanMashhadi / MSR2021-ProgramRepair
View on GitHub
Code of our paper Applying CodeBERT for Automated Program Repair of Java Simple Bugs which is accepted to MSR 2021.
☆52Nov 27, 2022Updated 3 years ago
scaleapi / SWE-bench_Pro-os
View on GitHub
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?
☆487May 18, 2026Updated 2 months ago
rizwan09 / REDCODER
View on GitHub
☆44Jun 24, 2025Updated last year
AgenticIR-Lab / OThink-R1
View on GitHub
This is the official code for OThink-R1 project.
☆21Jun 19, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
INK-USC / Reflect
View on GitHub
Data and Code for Paper "Reflect Not Reflex: Inference-Based Common Ground Improves Dialogue Response Quality" (EMNLP 2022)
☆11Nov 28, 2022Updated 3 years ago
Code4Agent / codeagent
View on GitHub
☆22Jul 16, 2024Updated 2 years ago
DeepSoftwareAnalytics / RLCoder
View on GitHub
Reinforcement Learning for Repository-Level Code Completion
☆43Jun 15, 2026Updated last month
MrVPlusOne / Coeditor
View on GitHub
Coeditor: Leveraging Repo-level Diffs for Code Auto-editing
☆31Feb 25, 2024Updated 2 years ago
microsoft / FEA-Bench
View on GitHub
[ACL25] FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation
☆57Jan 28, 2026Updated 5 months ago
JetBrains-Research / EnvBench
View on GitHub
[DL4C @ ICLR 2025] A Benchmark for Automated Environment Setup
☆38Nov 9, 2025Updated 8 months ago
Leolty / repobench
View on GitHub
✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024
☆214Aug 16, 2024Updated last year