multimodal-art-projection/NL2RepoBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/multimodal-art-projection/NL2RepoBench)

multimodal-art-projection / NL2RepoBench

☆144

Alternatives and similar repositories for NL2RepoBench

Users that are interested in NL2RepoBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zsworld6 / projdevbench
View on GitHub
☆23May 7, 2026Updated 2 months ago
MiniMax-AI / mini-vela
View on GitHub
☆37Apr 2, 2026Updated 3 months ago
kwaipilot / SWE-Compass
View on GitHub
☆18Mar 28, 2026Updated 3 months ago
SWE-EVO / SWE-EVO
View on GitHub
☆53May 3, 2026Updated 2 months ago
scaleapi / SWE-bench_Pro-os
View on GitHub
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?
☆485May 18, 2026Updated 2 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
LiberCoders / FeatureBench
View on GitHub
[ICLR 2026] Official Implementation of "FeatureBench: Benchmarking Agentic Coding for Complex Feature Development"
☆83Jun 13, 2026Updated last month
finyorko / longcli-bench
View on GitHub
LongCLI-Bench's official repository
☆44May 25, 2026Updated last month
Proximal-Labs / frontier-swe
View on GitHub
FrontierSWE is an ultra long-horizon coding agent benchmark that tests implementation, performance eng and ML research
☆187Updated this week
R2E-Gym / R2E-Gym
View on GitHub
[COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents
☆307Jul 13, 2025Updated last year
NJU-LINK / WebCompass
View on GitHub
The Source Code for WebCompass
☆21May 2, 2026Updated 2 months ago
phonism / CP-Zero
View on GitHub
Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.
☆18Apr 22, 2025Updated last year
multimodal-art-projection / KORGym
View on GitHub
☆60May 21, 2025Updated last year
ZexuSun / AgentSkiller
View on GitHub
☆30Feb 11, 2026Updated 5 months ago
menik1126 / Swing-Bench
View on GitHub
[ICLR2026🔥Oral] SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving
☆15Feb 26, 2026Updated 4 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Danau5tin / tbench-agentic-data-pipeline
View on GitHub
Multi-agent synthetic data generation pipeline capable of generating and validating long horizon terminal/coding tasks for RL training
☆70Jul 28, 2025Updated 11 months ago
JiayiGeng / CAID
View on GitHub
Code repo for paper: Effective Strategies for Asynchronous Software Engineering Agents
☆64Apr 2, 2026Updated 3 months ago
YerbaPage / CodeOCR
View on GitHub
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding [ISSTA 2026]
☆29Feb 2, 2026Updated 5 months ago
aisa-group / PostTrainBench
View on GitHub
Measuring how well CLI agents like Claude Code or Codex CLI can post-train base LLMs on a single H100 GPU in 10 hours
☆462Updated this week
facebookresearch / ProgramBench
View on GitHub
Can Language Models Rebuild Programs From Scratch?
☆855Updated this week
harbor-framework / harbor
View on GitHub
Framework for evaluating and improving agents
☆3,320Updated this week
abundant-ai / swe-marathon
View on GitHub
SWE-Marathon: an ultra long-horizon SWE benchmark
☆109Updated this week
SWE-rebench / SWE-rebench-V2
View on GitHub
Tools and prompt templates used to build and evaluate SWE-rebench-v2 tasks for the paper.
☆71Mar 12, 2026Updated 4 months ago
microsoft / SWE-bench-Live
View on GitHub
[NeurIPS 2025 D&B] 🚀 SWE-bench Goes Live!
☆209Jun 11, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
THUDM / SWE-Dev
View on GitHub
[ACL25' Findings] SWE-Dev is an SWE agent with a scalable test case construction pipeline.
☆62Jul 21, 2025Updated 11 months ago
GAIR-NLP / self-improvement-reversal
View on GitHub
☆13Jul 14, 2024Updated 2 years ago
SWE-bench / SWE-smith
View on GitHub
[NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents
☆709Jul 13, 2026Updated last week
DeepSoftwareAnalytics / Awesome-Issue-Resolution
View on GitHub
Advances and Frontiers of LLM-based Issue Resolution in Software Engineering A Comprehensive Survey
☆85Apr 22, 2026Updated 2 months ago
Timothyxxx / TestTimeTrainingPapers
View on GitHub
☆59Apr 13, 2026Updated 3 months ago
DeepSoftwareAnalytics / swe-factory
View on GitHub
[FSE'2026] SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks
☆183May 12, 2026Updated 2 months ago
whisperzqh / ProjectGen
View on GitHub
☆15Nov 28, 2025Updated 7 months ago
JesseZZZZZ / RepoZero
View on GitHub
RepoZero: Can LLMs Generate a Code Repository from Scratch? (https://arxiv.org/abs/2605.07122)
☆30Jun 4, 2026Updated last month
claw-eval / claw-eval
View on GitHub
Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.
☆726May 17, 2026Updated 2 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
camel-ai / seta-env
View on GitHub
💻 SETA: Scaling Environments for Terminal Agents - Environments
☆142Feb 16, 2026Updated 5 months ago
hkust-nlp / Toolathlon
View on GitHub
[ICLR 2026] The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
☆430Updated this week
GAIR-NLP / BeHonest
View on GitHub
BeHonest: Benchmarking Honesty in Large Language Models
☆35Aug 15, 2024Updated last year
GAIR-NLP / OpenSWE
View on GitHub
☆197Mar 16, 2026Updated 4 months ago
multimodal-art-projection / REER_DeepWriter
View on GitHub
REverse-Engineered Reasoning for Open-Ended Generation
☆98Sep 10, 2025Updated 10 months ago
ernie-research / MEnvAgent
View on GitHub
Official Code of MEnvAgent
☆23Feb 3, 2026Updated 5 months ago
hkust-nlp / LOCA-bench
View on GitHub
Benchmarking Language Agents Under Controllable and Extreme Context Growth
☆50Apr 29, 2026Updated 2 months ago