SWE-Perf / SWE-PerfLinks
β40Updated this week
Alternatives and similar repositories for SWE-Perf
Users that are interested in SWE-Perf are comparing it to the libraries listed below
Sorting:
- SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolutionβ90Updated last month
- [NeurIPS 2025 D&B] π SWE-bench Goes Live!β129Updated this week
- CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)β159Updated 2 months ago
- Must-read papers on Repository-level Code Generation & Issue Resolution π₯β195Updated last week
- The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".β83Updated last year
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srwβ62Updated last year
- LeetCode Training and Evaluation Datasetβ39Updated 6 months ago
- β12Updated 3 months ago
- An Evolving Code Generation Benchmark Aligned with Real-world Code Repositoriesβ63Updated last year
- CodeRAG-Bench: Can Retrieval Augment Code Generation?β156Updated 11 months ago
- β¨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024β174Updated last year
- β33Updated last month
- [COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agentsβ178Updated 3 months ago
- Baselines for all tasks from Long Code Arena benchmarks ποΈβ35Updated 7 months ago
- A Comprehensive Benchmark for Software Development.β115Updated last year
- β53Updated last year
- Reproducing R1 for Code with Reliable Rewardsβ262Updated 5 months ago
- [NeurIPS'25] Official Implementation of RISE (Reinforcing Reasoning with Self-Verification)β30Updated 2 months ago
- NaturalCodeBench (Findings of ACL 2024)β67Updated last year
- [LREC-COLING'24] HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalizationβ38Updated 7 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learningβ114Updated 5 months ago
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"β174Updated 5 months ago
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluationβ154Updated last year
- A distributed, extensible, secure solution for evaluating machine generated code with unit tests in multiple programming languages.β56Updated last year
- β66Updated 10 months ago
- [NeurIPS'24] Official code for *π―DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*β115Updated 10 months ago
- β51Updated 5 months ago
- [EMNLP 2024] CodeJudge: Evaluating Code Generation with Large Language Modelsβ50Updated last month
- β142Updated this week
- Multi-SWE-bench: A Multilingual Benchmark for Issue Resolvingβ268Updated last week