Run SWE-bench evaluations remotely
☆60Aug 14, 2025Updated 7 months ago
Alternatives and similar repositories for sb-cli
Users that are interested in sb-cli are comparing it to the libraries listed below
Sorting:
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆453Updated this week
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆255Feb 27, 2026Updated 3 weeks ago
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆597Updated this week
- ☆13Mar 5, 2025Updated last year
- ☆28Nov 10, 2025Updated 4 months ago
- Benchmarking Goal-Oriented Software Engineering☆122Jan 7, 2026Updated 2 months ago
- Artifact for TOSEM Submission: GiantRepair☆13Jun 26, 2024Updated last year
- Prolog implemented in Python☆12Sep 6, 2024Updated last year
- ☆11Sep 10, 2023Updated 2 years ago
- ☆104Jul 17, 2024Updated last year
- 🗜️Codebase of the ACIP algorithm 🗜️☆16Feb 11, 2026Updated last month
- [ISSTA 2025] A Large-scale Empirical Study on Fine-tuning Large Language Models for Unit Testing☆13Feb 9, 2025Updated last year
- ☆10Nov 15, 2023Updated 2 years ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆650Jul 29, 2025Updated 7 months ago
- RAG Hallucination Detecting By LRP.☆11Mar 31, 2025Updated 11 months ago
- [COLM '25] Single-Pass Document Scanning for Question Answering☆12Aug 20, 2025Updated 7 months ago
- TSQA: Tabular Scenario Based Question Answering (AAAI 2021)☆18Dec 17, 2020Updated 5 years ago
- Official implementation of Panacea: A foundation model for clinical trial design, recruitment, search, and summarization.☆18Dec 24, 2024Updated last year
- SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution☆103Sep 24, 2025Updated 5 months ago
- ESEC/FSE'21: Prediction-Preserving Program Simplification☆10Oct 4, 2022Updated 3 years ago
- ChatTea: A Rust-based chat app with an async server-client setup using Tokio, using a Terminal User Interface built with ratatui. It empl…☆13Sep 16, 2024Updated last year
- [NeurIPS'25] Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆679Mar 16, 2025Updated last year
- SWE-bench: Can Language Models Resolve Real-world Github Issues?☆4,478Updated this week
- FSE 2023 RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair☆14Oct 23, 2024Updated last year
- ☆48Oct 28, 2025Updated 4 months ago
- [NAACL 2024] Topics, Authors, and Institutions in Large Language Model Research: Trends from 17K arXiv Papers https://arxiv.org/abs/2307.…☆17Jan 27, 2024Updated 2 years ago
- ☆10Oct 28, 2019Updated 6 years ago
- Notebooks for 6.S088 IAP 2023☆16Aug 1, 2024Updated last year
- Lenient parser for Semantic Version numbers in Rust☆12Feb 13, 2023Updated 3 years ago
- r4c☆14Mar 2, 2021Updated 5 years ago
- ☆60Jan 28, 2025Updated last year
- ☆17Nov 18, 2024Updated last year
- Code for verifying deep neural feature ansatz☆22May 3, 2023Updated 2 years ago
- Code for COLING 2022 accepted paper titled "MuCDN: Mutual Conversational Detachment Network for Emotion Recognition in Multi-Party Conver…☆10Jul 21, 2023Updated 2 years ago
- [ACL'25] UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench☆35Aug 12, 2025Updated 7 months ago
- This comprehensive learning resource provides two complete tutorials for mastering Model Context Protocol (MCP) development with Rust. Fr…☆18Dec 1, 2025Updated 3 months ago
- SemBleu: A Robust Metric for AMR Parsing Evaluation☆12Feb 22, 2021Updated 5 years ago
- Drift-Resilient TabPFN is a method using In-Context Learning via a Prior-Data Fitted Network, to address temporal distribution shifts in …☆28May 17, 2025Updated 10 months ago
- ☆18May 27, 2025Updated 9 months ago