swe-bench / SWE-bench
[ICLR 2024] SWE-bench: Can Language Models Resolve Real-world Github Issues?
☆2,288Updated this week
Alternatives and similar repositories for SWE-bench:
Users that are interested in SWE-bench are comparing it to the libraries listed below
- A project structure aware autonomous software engineer aiming for autonomous program improvement. Resolved 37.3% tasks (pass@1) in SWE-be…☆2,793Updated last week
- Agentless🐱: an agentless approach to automatically solve software development problems☆1,281Updated 3 weeks ago
- [NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments☆1,528Updated this week
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆4,812Updated last month
- Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""☆3,720Updated last month
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality☆3,442Updated 5 months ago
- ☆2,180Updated last week
- [ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling☆1,580Updated 6 months ago
- [ICML'24] Magicoder: Empowering Code Generation with OSS-Instruct☆1,992Updated 2 months ago
- AIOS: AI Agent Operating System☆3,676Updated last week
- An self-improving embodied conversational agent seamlessly integrated into the operating system to automate our daily tasks.☆1,569Updated 4 months ago
- LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step☆472Updated 4 months ago
- Llama-3 agents that can browse the web by following instructions and talking to you☆1,382Updated last month
- Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.☆2,908Updated this week
- Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"☆812Updated 3 weeks ago
- OpenCodeInterpreter is a suite of open-source code generation systems aimed at bridging the gap between large language models and sophist…☆1,622Updated 8 months ago
- A code-first agent framework for seamlessly planning and executing data analytics tasks.☆5,458Updated this week
- A benchmark to evaluate language models on questions I've previously asked them to solve.☆949Updated 2 months ago
- Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhan…☆548Updated 7 months ago
- [ICML 2024] Official repository for "Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models"☆724Updated 5 months ago
- SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It can also be employed for offensiv…☆14,188Updated this week
- Common interface for interacting with AI agents. The protocol is tech stack agnostic - you can use it with any framework for building age…☆1,064Updated last week
- Home of StarCoder2!☆1,827Updated 9 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆1,654Updated 5 months ago
- ☆4,050Updated 7 months ago
- Large Action Model framework to develop AI Web Agents☆5,807Updated 2 months ago
- Automated Design of Agentic Systems☆1,135Updated last week
- Harness LLMs with Multi-Agent Programming☆2,917Updated this week
- Supercharge Your LLM Application Evaluations 🚀☆7,889Updated this week
- A framework for prompt tuning using Intent-based Prompt Calibration☆2,311Updated last month