Cognition's results and methodology on SWE-bench
☆124Mar 15, 2024Updated 2 years ago
Alternatives and similar repositories for devin-swebench-results
Users that are interested in devin-swebench-results are comparing it to the libraries listed below
Sorting:
- ☆104Jul 17, 2024Updated last year
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆255Feb 27, 2026Updated 3 weeks ago
- ☆12Jan 31, 2024Updated 2 years ago
- ESEC/FSE'21: Prediction-Preserving Program Simplification☆10Oct 4, 2022Updated 3 years ago
- ☆12Mar 3, 2022Updated 4 years ago
- ☆18Feb 4, 2025Updated last year
- ☆159Aug 27, 2024Updated last year
- ☆46Jan 17, 2026Updated 2 months ago
- Contains the model patches and the eval logs from the passing swe-bench-lite run.☆10Jun 28, 2024Updated last year
- A Comprehensive Benchmark for Software Development.☆129May 30, 2024Updated last year
- A forest of autonomous agents.☆20Jan 27, 2025Updated last year
- Run SWE-bench evaluations remotely☆60Aug 14, 2025Updated 7 months ago
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆453Updated this week
- [ICML '24] R2E: Turn any GitHub Repository into a Programming Agent Environment☆141Apr 20, 2025Updated 11 months ago
- ☆21Mar 2, 2026Updated 2 weeks ago
- ☆133May 8, 2025Updated 10 months ago
- Code for the paper "A Structural Model for Contextual Code Changes"☆32Oct 25, 2023Updated 2 years ago
- A clean no-jargon mathematical definition of transforrmer language model with a Python implementation that focuses on clarity rather than…☆11Jul 23, 2022Updated 3 years ago
- repo for the paper titled “CodeGen4Libs: A Two-Stage Approach for Library-Oriented Code Generation”☆14Oct 4, 2023Updated 2 years ago
- SWE-bench: Can Language Models Resolve Real-world Github Issues?☆4,478Updated this week
- ☆31Sep 4, 2021Updated 4 years ago
- JAX bindings for the flash-attention3 kernels☆21Jan 2, 2026Updated 2 months ago
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆597Updated this week
- Bayesian scaling laws for in-context learning.☆15Mar 12, 2025Updated last year
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆650Jul 29, 2025Updated 7 months ago
- Docker setup and creation for OpenDevin https://github.com/OpenDevin/OpenDevin☆11Apr 15, 2024Updated last year
- Security Vulnerability Repair via Concolic Execution and Code Mutations☆19Sep 12, 2024Updated last year
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆180Feb 26, 2026Updated 3 weeks ago
- A hackable library for running and fine-tuning modern transformer models on commodity and alternative GPUs, powered by tinygrad.☆28Feb 10, 2026Updated last month
- Fine Tuning Multimodal LLM "Idefics 9B" on Pokemon Go Dataset available on Hugging Face.☆18Jan 15, 2024Updated 2 years ago
- Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhan…☆1,630May 23, 2024Updated last year
- Agentless🐱: an agentless approach to automatically solve software development problems☆2,019Dec 22, 2024Updated last year
- ☆13Mar 5, 2025Updated last year
- A hard gym for programming☆164Jul 7, 2024Updated last year
- Andromeda Software HVNC☆13Apr 3, 2024Updated last year
- Access your Ollama inference server running on your computer from anywhere. Set up with NextJS + Langchain JS LCEL + Ngrok☆26Feb 13, 2024Updated 2 years ago
- A compiler for BLOG probabilistic programming language☆26Dec 2, 2017Updated 8 years ago
- Code for "Learning Structural Edits via Incremental Tree Transformations" (ICLR'21)☆41Jun 20, 2021Updated 4 years ago
- ☆10Nov 14, 2024Updated last year