SWE-Gym / SWE-Bench-ForkLinks
☆12Updated 10 months ago
Alternatives and similar repositories for SWE-Bench-Fork
Users that are interested in SWE-Bench-Fork are comparing it to the libraries listed below
Sorting:
- ☆56Updated last year
- ☆31Updated last year
- Official code implementation for the ACL 2025 paper: 'Dynamic Scaling of Unit Tests for Code Reward Modeling'☆27Updated 7 months ago
- [NeurIPS 2025 Spotlight] Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning☆147Updated 3 months ago
- Implementation and datasets for "Training Language Models to Generate Quality Code with Program Analysis Feedback"☆37Updated 5 months ago
- SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution☆100Updated 3 months ago
- StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback☆74Updated last year
- NaturalCodeBench (Findings of ACL 2024)☆69Updated last year
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆120Updated 8 months ago
- ☆32Updated 2 weeks ago
- ☆69Updated 7 months ago
- ☆55Updated 3 months ago
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]☆95Updated 9 months ago
- Baselines for all tasks from Long Code Arena benchmarks 🏟️☆39Updated 9 months ago
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆29Updated 3 months ago
- ☆88Updated 2 months ago
- ☆23Updated last year
- ☆28Updated 2 months ago
- Training and Benchmarking LLMs for Code Preference.☆37Updated last year
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Updated last year
- Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".☆25Updated 5 months ago
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆64Updated last year
- Codebase for Instruction Following without Instruction Tuning☆36Updated last year
- [NeurIPS 2025 D&B] 🚀 SWE-bench Goes Live!☆153Updated last week
- ☆33Updated 3 months ago
- ☆47Updated 3 months ago
- ☆53Updated 11 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆76Updated 3 months ago
- [COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents☆224Updated 5 months ago
- The rule-based evaluation subset and code implementation of Omni-MATH☆26Updated last year