openai / SWELancer-BenchmarkLinks
This repo contains the dataset and code for the paper "SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?"
☆1,438Updated last month
Alternatives and similar repositories for SWELancer-Benchmark
Users that are interested in SWELancer-Benchmark are comparing it to the libraries listed below
Sorting:
- Releases from OpenAI Preparedness☆846Updated this week
- Agentless🐱: an agentless approach to automatically solve software development problems☆1,883Updated 8 months ago
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆910Updated this week
- Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆593Updated 5 months ago
- Renderer for the harmony response format to be used with gpt-oss☆3,623Updated last week
- SWE-bench [Multimodal]: Can Language Models Resolve Real-world Github Issues?☆3,372Updated last week
- An agent benchmark with tasks in a simulated software company.☆534Updated this week
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆526Updated last month
- Code and Data for Tau-Bench☆791Updated last month
- [ICLR 2025] Automated Design of Agentic Systems☆1,402Updated 7 months ago
- The #1 open-source SWE-bench Verified implementation☆801Updated 2 months ago
- ☆576Updated 2 weeks ago
- E2B Desktop Sandbox for LLMs. E2B Sandbox with desktop graphical environment that you can connect to any LLM for secure computer use.☆1,071Updated last week
- Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.☆757Updated 3 months ago
- OO for LLMs☆844Updated last week
- Verifiers for LLM Reinforcement Learning☆2,704Updated this week
- Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhan…☆1,356Updated last year
- LiveBench: A Challenging, Contamination-Free LLM Benchmark