swe-bench / SWE-bench
SWE-bench [Multimodal]: Can Language Models Resolve Real-world Github Issues?
☆2,662Updated 3 weeks ago
Alternatives and similar repositories for SWE-bench:
Users that are interested in SWE-bench are comparing it to the libraries listed below
- Agentless🐱: an agentless approach to automatically solve software development problems☆1,584Updated 3 months ago
- A project structure aware autonomous software engineer aiming for autonomous program improvement. Resolved 37.3% tasks (pass@1) in SWE-be…☆2,887Updated this week
- Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""☆3,792Updated 3 months ago
- Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024☆1,406Updated 2 months ago
- [ICML'24] Magicoder: Empowering Code Generation with OSS-Instruct☆2,001Updated 4 months ago
- AIOS: AI Agent Operating System☆3,965Updated this week
- Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhan…☆907Updated 10 months ago
- A benchmark to evaluate language models on questions I've previously asked them to solve.☆990Updated last month
- This repo contains the dataset and code for the paper "SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software E…☆1,287Updated last week
- [NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments☆1,698Updated 2 weeks ago
- PyTorch native post-training library☆5,014Updated this week
- LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step☆509Updated 6 months ago
- [ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling☆1,640Updated 8 months ago
- Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"☆921Updated last month
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality☆3,746Updated 7 months ago
- Code for the paper "Evaluating Large Language Models Trained on Code"☆2,638Updated 2 months ago
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆4,957Updated 2 weeks ago
- ☆2,459Updated this week
- Common interface for interacting with AI agents. The protocol is tech stack agnostic - you can use it with any framework for building age…☆1,146Updated 2 months ago
- Sky-T1: Train your own O1 preview model within $450☆3,142Updated last week
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆652Updated 2 months ago
- Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.☆3,007Updated last week
- Tools for merging pretrained large language models.☆5,458Updated this week
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆1,767Updated 7 months ago
- [ICLR 2025] Automated Design of Agentic Systems☆1,225Updated last month
- ☆367Updated last month
- An self-improving embodied conversational agent seamlessly integrated into the operating system to automate our daily tasks.☆1,632Updated 6 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,568Updated last week
- Agentic components of the Llama Stack APIs☆4,174Updated this week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,312Updated this week