Cognition's results and methodology on SWE-bench
☆124Mar 15, 2024Updated 2 years ago
Alternatives and similar repositories for devin-swebench-results
Users that are interested in devin-swebench-results are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆105Jul 17, 2024Updated last year
- ☆12Jan 31, 2024Updated 2 years ago
- ESEC/FSE'21: Prediction-Preserving Program Simplification☆10Oct 4, 2022Updated 3 years ago
- ☆17Feb 4, 2025Updated last year
- A Comprehensive Benchmark for Software Development.☆132May 30, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Run SWE-bench evaluations remotely☆70Aug 14, 2025Updated 10 months ago
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆528Jun 8, 2026Updated last week
- [ICML '24] R2E: Turn any GitHub Repository into a Programming Agent Environment☆148Apr 20, 2025Updated last year
- agents.md guides agents. codingagents.md helps humans pick the right one.☆40Feb 17, 2026Updated 3 months ago
- Code for the paper "A Structural Model for Contextual Code Changes"☆32Oct 25, 2023Updated 2 years ago
- ☆22Sep 15, 2025Updated 9 months ago
- repo for the paper titled “CodeGen4Libs: A Two-Stage Approach for Library-Oriented Code Generation”☆14Oct 4, 2023Updated 2 years ago
- ☆31Sep 4, 2021Updated 4 years ago
- YaoLang: The next DSL for Yao and quantum programs.☆31May 17, 2021Updated 5 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- SWE-bench: Can Language Models Resolve Real-world Github Issues?☆5,175Apr 1, 2026Updated 2 months ago
- ☆37May 19, 2026Updated 3 weeks ago
- Bayesian scaling laws for in-context learning.☆15Mar 12, 2025Updated last year
- Security Vulnerability Repair via Concolic Execution and Code Mutations☆21Sep 12, 2024Updated last year
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆690Jul 29, 2025Updated 10 months ago
- Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhan…☆1,669May 23, 2024Updated 2 years ago
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆672Jun 8, 2026Updated last week
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆183Apr 15, 2026Updated 2 months ago
- JAX bindings for the flash-attention3 kernels☆23Jan 2, 2026Updated 5 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Agentless🐱: an agentless approach to automatically solve software development problems☆2,068Dec 22, 2024Updated last year
- ☆13Mar 5, 2025Updated last year
- A hard gym for programming☆167Jul 7, 2024Updated last year
- FeedbackQA: Improving Question Answering Post-Deployment with Interactive Feedback☆12Jul 13, 2022Updated 3 years ago
- ICCV 2019 Tutorial: Global Optimization for Geometric Understanding with Provable Guarantees☆15Oct 20, 2022Updated 3 years ago
- A compiler for BLOG probabilistic programming language☆26Dec 2, 2017Updated 8 years ago
- Code for "Learning Structural Edits via Incremental Tree Transformations" (ICLR'21)☆41Jun 20, 2021Updated 4 years ago
- 实现一个自己的小语言模型☆11Jun 15, 2024Updated 2 years ago
- ☆13Feb 15, 2023Updated 3 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions☆26Aug 8, 2024Updated last year
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆249May 5, 2024Updated 2 years ago
- Fine-tune copilot based on your codebase☆12Mar 26, 2024Updated 2 years ago
- Archer2.0 evolves from its predecessor by introducing ASPO, which overcomes fundamental PPO-Clip limitations to prevent premature converg…☆31Oct 10, 2025Updated 8 months ago
- The codes for training sparsity predictor on LLaMA.☆18May 12, 2024Updated 2 years ago
- A hybrid quantum-classical neural network simulation platform. Quantum simulation uses QTensor, a state-of-the-art tensor network-based s…☆14Jun 27, 2023Updated 2 years ago
- ☆11Jul 25, 2020Updated 5 years ago