SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?
☆283Feb 11, 2026Updated 2 weeks ago
Alternatives and similar repositories for SWE-bench_Pro-os
Users that are interested in SWE-bench_Pro-os are comparing it to the libraries listed below
Sorting:
- SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner☆33Jun 29, 2025Updated 8 months ago
- ☆27Updated this week
- quasar-actors-integration-examples☆11Apr 27, 2016Updated 9 years ago
- The official Python library for Formulaic☆18Apr 25, 2024Updated last year
- ☆32Jan 25, 2026Updated last month
- ☆132May 8, 2025Updated 9 months ago
- ☆21Dec 25, 2025Updated 2 months ago
- Basic structures for finite elements based on ExtendableGrids infrastructure☆23Jan 29, 2026Updated last month
- [NeurIPS 2025 D&B] 🚀 SWE-bench Goes Live!☆165Feb 25, 2026Updated last week
- Beating the `bisect` module's implementation using C-extensions.☆32May 19, 2023Updated 2 years ago
- CLI tool for stateful random access of file streams☆26Dec 21, 2025Updated 2 months ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆644Jul 29, 2025Updated 7 months ago
- An intelligent tuner for vLLM that automatically monitors GPU metrics, uses Bayesian optimization to tune parameters☆47Updated this week
- [ICML25] CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale☆25Jul 31, 2025Updated 7 months ago
- All-in-one benchmarking platform for evaluating LLM.☆15Nov 12, 2025Updated 3 months ago
- Yeti ergonomic split keyboard☆22Jul 15, 2024Updated last year
- Benchmark ClassEval for class-level code generation.☆145Oct 24, 2024Updated last year
- This project provides a setup using Docker to create a service running Tor and a lighttpd web server. The .onion address remains persiste…☆16Jun 17, 2025Updated 8 months ago
- Rust SDK for S2, the durable streams API☆42Feb 16, 2026Updated 2 weeks ago
- SWE-Exp: Experience-Driven Software Issue Resolution☆35Oct 17, 2025Updated 4 months ago
- Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique☆18Aug 22, 2024Updated last year
- ☆20Oct 10, 2025Updated 4 months ago
- ☆132Jun 6, 2025Updated 8 months ago
- Search, browse, and resume your Claude Code sessions. Fast.☆45Updated this week
- Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving☆323Dec 18, 2025Updated 2 months ago
- A benchmark for LLMs on complicated tasks in the terminal☆1,614Jan 22, 2026Updated last month
- Official Implementation of "Simulating Environments with Reasoning Models for Agent Training"☆56Feb 18, 2026Updated last week
- Code for the paper "Coding Agents with Multimodal Browsing are Generalist Problem Solvers"☆98Oct 27, 2025Updated 4 months ago
- [ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI☆483Jan 3, 2026Updated 2 months ago
- [EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs☆29May 22, 2025Updated 9 months ago
- Run Claude Code or OpenAI Codex in the background☆32Updated this week
- Compression suite for data frames and tabular data files, csv, excel etc. Using LZHW algorithm.☆30Aug 17, 2024Updated last year
- Thorn in a HaizeStack test for evaluating long-context adversarial robustness.☆26Aug 3, 2024Updated last year
- Ludic – an LLM-RL library for the era of experience☆60Jan 9, 2026Updated last month
- Publish and install private python packages using OCI/docker registries.☆97Feb 22, 2026Updated last week
- Must-read papers on Repository-level Code Generation & Issue Resolution 🔥☆259Dec 22, 2025Updated 2 months ago
- [ACL25] FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation☆44Jan 28, 2026Updated last month
- Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claud…☆31Mar 20, 2025Updated 11 months ago
- Various implementation of byte matrix multiplication☆26Jan 10, 2025Updated last year