CognitionAI / devin-swebench-resultsView external linksLinks
Cognition's results and methodology on SWE-bench
☆123Mar 15, 2024Updated last year
Alternatives and similar repositories for devin-swebench-results
Users that are interested in devin-swebench-results are comparing it to the libraries listed below
Sorting:
- ☆104Jul 17, 2024Updated last year
- Harness used to benchmark aider against SWE Bench benchmarks☆79Jun 27, 2024Updated last year
- ☆45Jan 17, 2026Updated 3 weeks ago
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆246Jan 31, 2026Updated last week
- ESEC/FSE'21: Prediction-Preserving Program Simplification☆10Oct 4, 2022Updated 3 years ago
- ☆17Dec 10, 2025Updated 2 months ago
- Contains the model patches and the eval logs from the passing swe-bench-lite run.☆10Jun 28, 2024Updated last year
- A Datasette instance for searching WebVid-10M☆15Sep 30, 2022Updated 3 years ago
- ☆159Aug 27, 2024Updated last year
- A Comprehensive Benchmark for Software Development.☆127May 30, 2024Updated last year
- Run SWE-bench evaluations remotely☆53Aug 14, 2025Updated 5 months ago
- A zero-shot captcha solver.☆16Dec 22, 2023Updated 2 years ago
- The codes for training sparsity predictor on LLaMA.☆18May 12, 2024Updated last year
- Implementation of 'Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis', in MLX☆23Oct 30, 2024Updated last year
- SWE-bench: Can Language Models Resolve Real-world Github Issues?☆4,267Feb 3, 2026Updated last week
- A simple Flask app that lets you text back and forth with Open Interpreter. Probably a bad idea.☆22Oct 7, 2023Updated 2 years ago
- A collection of flake templates as starting points for your awesome projects☆18Jun 23, 2025Updated 7 months ago
- ☆17Feb 4, 2025Updated last year
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆430Feb 4, 2026Updated last week
- Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in MLX☆21Oct 8, 2024Updated last year
- Simple example client demonstrating how to connect to MCP servers over HTTP (SSE)☆17Jan 8, 2025Updated last year
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆551Updated this week
- BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions☆25Aug 8, 2024Updated last year
- Enhanced fork of SWE-bench, tailored for OpenDevin's ecosystem.☆28May 26, 2024Updated last year
- Text with Open Interpreter, running locally on your Mac. Credit: Morisy☆23Oct 6, 2023Updated 2 years ago
- Some examples of Lean projects, for undergraduate mathematicians.☆23Jun 7, 2021Updated 4 years ago
- ☆43Dec 16, 2025Updated last month
- Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhan…☆1,579May 23, 2024Updated last year
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆240May 5, 2024Updated last year
- A Cross-Domain Transferable Neural Coherence Model https://arxiv.org/abs/1905.11912☆24Jul 8, 2020Updated 5 years ago
- speech-to-speech-demo☆24Dec 12, 2025Updated 2 months ago
- ☆132May 8, 2025Updated 9 months ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆627Jul 29, 2025Updated 6 months ago
- AI Services API: serves langchain, huggingface, & other emergent python AI libraries as a service. This project mainly serves LibreChat, …☆33Jul 24, 2023Updated 2 years ago
- ☆32Nov 15, 2022Updated 3 years ago
- ☆33Oct 18, 2023Updated 2 years ago
- MiniMax-Provider-Verifier offers a rigorous, vendor-agnostic way to verify whether third-party deployments of the Minimax M2 model are co…☆23Jan 15, 2026Updated 3 weeks ago
- GPI-Space: Memory Driven Computing and Big Data☆10Jan 2, 2025Updated last year
- Plugin QGIS☆10Jan 16, 2023Updated 3 years ago