multi-swe-bench / multi-swe-bench-envLinks

☆1

Alternatives and similar repositories for multi-swe-bench-env

Users that are interested in multi-swe-bench-env are comparing it to the libraries listed below

Sorting:

qishenghu / InstructCoder
InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw
☆62Updated 10 months ago
RepoUnderstander / RepoUnderstander
Official implementation of paper How to Understand Whole Repository? New SOTA on SWE-bench Lite (21.3%)
☆87Updated 4 months ago
ntunlp / ExecEval
A distributed, extensible, secure solution for evaluating machine generated code with unit tests in multiple programming languages.
☆56Updated 9 months ago
thunlp / DebugBench
The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".
☆80Updated last year
xlang-ai / EVOR
☆67Updated 7 months ago
amazon-science / cceval
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)
☆153Updated last year
DeepSoftwareAnalytics / RLCoder
Reinforcement Learning for Repository-Level Code Completion
☆35Updated 11 months ago
THUDM / NaturalCodeBench
NaturalCodeBench (Findings of ACL 2024)
☆68Updated 9 months ago
ozyyshr / RepoGraph
Enhancing AI Software Engineering with Repository-level Code Graph
☆197Updated 4 months ago
amazon-science / llm-code-preference
Training and Benchmarking LLMs for Code Preference.
☆34Updated 8 months ago
evo-eval / evoeval
EvoEval: Evolving Coding Benchmarks via LLM
☆76Updated last year
shrivastavadisha / repo_level_prompt_generation
☆124Updated 2 years ago
seketeam / EvoCodeBench
An Evolving Code Generation Benchmark Aligned with Real-world Code Repositories
☆62Updated 11 months ago
logic-star-ai / swt-bench
[NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation
☆51Updated last week
CodeEditorBench / CodeEditorBench
☆51Updated last year
Ablustrund / APPS_Plus
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
☆67Updated 11 months ago
NL2Code / NL2Code.github.io
Large Language Models Meet NL2Code: A Survey
☆35Updated 8 months ago
Leolty / repobench
✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024
☆169Updated 11 months ago
facebookresearch / cruxeval
CRUXEval: Code Reasoning, Understanding, and Execution Evaluation
☆151Updated 9 months ago
open-compass / DevEval
A Comprehensive Benchmark for Software Development.
☆111Updated last year
Zyq-scut / RLTF
Accepted by Transactions on Machine Learning Research (TMLR)
☆130Updated 10 months ago
SparksofAGI / MHPP
☆32Updated last month
aorwall / SWE-bench-docker
☆100Updated last year
microsoft / ReACC
Source codes for paper ”ReACC: A Retrieval-Augmented Code Completion Framework“
☆62Updated 3 years ago
reddy-lab-code-research / PPOCoder
Code for the TMLR 2023 paper "PPOCoder: Execution-based Code Generation using Deep Reinforcement Learning"
☆114Updated last year
R2E-Gym / R2E-Gym
Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents
☆139Updated 3 weeks ago
SalesforceAIResearch / swecomm
☆27Updated 6 months ago
FudanSELab / ClassEval
Benchmark ClassEval for class-level code generation.
☆145Updated 9 months ago
SalesforceAIResearch / CodeChain
Official code for the paper "CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules"
☆45Updated 6 months ago
crux-eval / eval-arena
☆28Updated 3 weeks ago