openai / SWELancer-Benchmark
This repo contains the dataset and code for the paper "SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?"
☆1,353Updated 3 weeks ago
Alternatives and similar repositories for SWELancer-Benchmark:
Users that are interested in SWELancer-Benchmark are comparing it to the libraries listed below
- Releases from OpenAI Preparedness☆704Updated 2 weeks ago
- Agent S: an open agentic framework that uses computers like a human☆2,436Updated this week
- Agentless🐱: an agentless approach to automatically solve software development problems☆1,639Updated 4 months ago
- Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆509Updated last month
- Training Large Language Model to Reason in a Continuous Latent Space☆1,076Updated 3 months ago
- AI computer use powered by open source LLMs and E2B Desktop Sandbox☆1,055Updated last month
- An agent benchmark with tasks in a simulated software company.☆294Updated 2 weeks ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym☆438Updated 3 weeks ago
- Verifiers for LLM Reinforcement Learning☆827Updated 3 weeks ago
- Synthetic data curation for post-training and structured data extraction☆1,257Updated this week
- The #1 open-source SWE-bench Verified implementation☆428Updated 2 weeks ago
- ☆544Updated 3 weeks ago
- A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!☆1,043Updated 2 months ago
- A pattern for an always on AI Assistant powered by Deepseek-V3, RealtimeSTT, and Typer for engineering☆897Updated 3 months ago
- Make Mac apps accessible for AI agents☆947Updated last month
- Make any LLM to think like OpenAI o1 and deepseek R1☆485Updated 2 months ago
- Keep searching, reading webpages, reasoning until it finds the answer (or exceeding the token budget)☆4,041Updated this week
- [ICLR 2025] Automated Design of Agentic Systems☆1,258Updated 2 months ago
- OctoTools: An agentic framework with extensible tools for complex reasoning☆1,090Updated this week
- A curated list of resources about AI agents for Computer Use, including research papers, projects, frameworks, and tools.☆1,174Updated 3 weeks ago
- Sky-T1: Train your own O1 preview model within $450☆3,220Updated this week
- Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.☆669Updated last month
- Big & Small LLMs working together☆717Updated this week
- Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhan…☆1,097Updated 11 months ago
- ☆1,690Updated 3 weeks ago
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆685Updated last week
- AIDE: AI-Driven Exploration in the Space of Code. State of the Art machine Learning engineering agents that automates AI R&D.☆859Updated last week
- Reasoning Augmented Generation☆841Updated 2 months ago
- MLGym A New Framework and Benchmark for Advancing AI Research Agents☆484Updated 2 weeks ago
- The NVIDIA AgentIQ toolkit is an open-source library for efficiently connecting and optimizing teams of AI agents.☆723Updated this week