This repo contains the dataset and code for the paper "SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?"
☆1,439Jul 18, 2025Updated 10 months ago
Alternatives and similar repositories for SWELancer-Benchmark
Users that are interested in SWELancer-Benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Agentless🐱: an agentless approach to automatically solve software development problems☆2,064Dec 22, 2024Updated last year
- SWE-bench: Can Language Models Resolve Real-world Github Issues?☆5,064Apr 1, 2026Updated 2 months ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆686Jul 29, 2025Updated 10 months ago
- [NeurIPS'25] Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆696Mar 16, 2025Updated last year
- OpenAI Frontier Evals☆1,212Apr 21, 2026Updated last month
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving☆337Dec 18, 2025Updated 5 months ago
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆1,566Apr 24, 2026Updated last month
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆668Jun 1, 2026Updated last week
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆265Mar 29, 2026Updated 2 months ago
- SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersec…☆19,436Updated this week
- MLGym A New Framework and Benchmark for Advancing AI Research Agents☆601Aug 10, 2025Updated 9 months ago
- Democratizing Reinforcement Learning for LLMs☆5,592Updated this week
- Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.☆21,580Apr 15, 2026Updated last month
- Commit0: Library Generation from Scratch☆192Feb 24, 2026Updated 3 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆4,516Apr 22, 2026Updated last month
- Sky-T1: Train your own O1 preview model within $450☆3,390Jul 12, 2025Updated 10 months ago
- 👩⚖️ Agent-as-a-Judge: The Magic for Open-Endedness☆771Mar 28, 2026Updated 2 months ago
- ☆138Jun 6, 2025Updated last year
- [COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents☆290Jul 13, 2025Updated 10 months ago
- ☆640Sep 1, 2025Updated 9 months ago
- Fully open reproduction of DeepSeek-R1