[ACL25' Findings] SWE-Dev is an SWE agent with a scalable test case construction pipeline.
☆59Jul 21, 2025Updated 8 months ago
Alternatives and similar repositories for SWE-Dev
Users that are interested in SWE-Dev are comparing it to the libraries listed below
Sorting:
- [DL4C @ ICLR 2025] A Benchmark for Automated Environment Setup☆35Nov 9, 2025Updated 4 months ago
- A pytorch implementation of Abstract Syntax Networks☆12Jun 27, 2025Updated 8 months ago
- Evaluation utilities based on SymPy.☆22Dec 12, 2024Updated last year
- On demand communication☆32Mar 3, 2026Updated 2 weeks ago
- Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving☆326Dec 18, 2025Updated 3 months ago
- Official code space for "SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development"☆61Oct 24, 2025Updated 4 months ago
- SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution☆103Sep 24, 2025Updated 5 months ago
- Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.☆18Apr 22, 2025Updated 11 months ago
- Reproducing R1 for Code with Reliable Rewards☆297May 5, 2025Updated 10 months ago
- [ICML '24] R2E: Turn any GitHub Repository into a Programming Agent Environment☆141Apr 20, 2025Updated 11 months ago
- Implementation of KDR-Agent, the AAAI 2025 accepted paper, focusing on knowledge-driven reasoning for autonomous agents.☆18Nov 24, 2025Updated 3 months ago
- ☆13Mar 5, 2025Updated last year
- [EMNLP 2024] RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning☆15May 13, 2025Updated 10 months ago
- Environments, tools, and benchmarks for general computer agents☆14Dec 3, 2024Updated last year
- CAN Bus Voltage Dataset for the SIMPLE paper☆11Oct 2, 2019Updated 6 years ago
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation☆168Oct 11, 2024Updated last year
- Tools for working with the S800 corpus☆12Sep 17, 2020Updated 5 years ago
- The benchmark proposed in paper: GraphInstruct: Empowering Large Language Models with Graph Understanding and Reasoning Capability☆25Aug 12, 2025Updated 7 months ago
- ☆41Jun 19, 2024Updated last year
- ☆13Apr 26, 2023Updated 2 years ago
- Universal LLM security auditor with automated jailbreak testing, DSPy optimization, and OWASP 2025-aligned attack patterns☆21Oct 23, 2025Updated 4 months ago
- ☆337May 24, 2025Updated 9 months ago
- This is the official repository of the paper "Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Schedulin…☆13Jul 27, 2025Updated 7 months ago
- Enhanced fork of SWE-bench, tailored for OpenDevin's ecosystem.☆28May 26, 2024Updated last year
- This is the Placeholder for Llama. Starting with Llama 3☆11May 20, 2024Updated last year
- the implementation of Embedding API Dependency Graph for Neural Code Generation☆12Jun 6, 2021Updated 4 years ago
- [ICLR 2026] Official Implementation of "FeatureBench: Benchmarking Agentic Coding for Complex Feature Development"☆35Mar 3, 2026Updated 2 weeks ago
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆27Dec 17, 2024Updated last year
- Repository with details for a ECUPrint dataset (CAN logs and CAN voltage samples)☆20Oct 2, 2022Updated 3 years ago
- SGLang Kernel Wheel Index☆17Updated this week
- ☆14Jan 8, 2025Updated last year
- Python powered music controlling webpage with websockets and bottle py (works with spotify, vlc, audacious, and others)☆11Jun 9, 2017Updated 8 years ago
- ☆11Oct 11, 2023Updated 2 years ago
- [ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI☆488Jan 3, 2026Updated 2 months ago
- A simple & powerful danmaku framework.☆14Mar 17, 2023Updated 3 years ago
- ☆109Jul 15, 2025Updated 8 months ago
- Code Snippet Recommendation from Stack Overflow Post☆19Jun 30, 2021Updated 4 years ago
- Rick Roll Language is a rickroll based, process oriented, dynamic, strong, esoteric programming language. All of the keywords/statements …☆11Jan 27, 2022Updated 4 years ago
- AIDE: the Machine Learning CodeGen Agent☆25Oct 7, 2024Updated last year