swe-bench / SWE-bench
SWE-bench [Multimodal]: Can Language Models Resolve Real-world Github Issues?
☆2,459Updated this week
Alternatives and similar repositories for SWE-bench:
Users that are interested in SWE-bench are comparing it to the libraries listed below
- A project structure aware autonomous software engineer aiming for autonomous program improvement. Resolved 37.3% tasks (pass@1) in SWE-be…☆2,835Updated 2 weeks ago
- Agentless🐱: an agentless approach to automatically solve software development problems☆1,455Updated last month
- [NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments☆1,621Updated this week
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality☆3,635Updated 6 months ago
- An self-improving embodied conversational agent seamlessly integrated into the operating system to automate our daily tasks.☆1,607Updated 5 months ago
- g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains☆4,185Updated 3 weeks ago
- Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhan…☆600Updated 8 months ago
- ☆2,343Updated 2 weeks ago
- Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models☆2,676Updated last month
- Llama-3 agents that can browse the web by following instructions and talking to you☆1,388Updated 2 months ago
- Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including C…☆2,855Updated this week
- Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""☆3,751Updated 2 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,448Updated this week
- [ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling☆1,612Updated 7 months ago
- 🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with Llam…☆8,689Updated this week
- Common interface for interacting with AI agents. The protocol is tech stack agnostic - you can use it with any framework for building age…☆1,098Updated last month
- SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It can also be employed for offensiv…☆14,674Updated this week
- A framework for prompt tuning using Intent-based Prompt Calibration☆2,368Updated 2 months ago
- Optimizing inference proxy for LLMs☆2,047Updated this week
- A library for advanced large language model reasoning☆1,955Updated this week
- Harness LLMs with Multi-Agent Programming☆3,072Updated this week
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆4,879Updated 3 weeks ago
- AIDE: the state-of-the-art machine learning engineer agent, generating machine learning solution code from natural language descriptions.☆747Updated this week
- The Open Source Memory Layer For Autonomous Agents☆2,000Updated 4 months ago
- Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024☆1,369Updated last month
- [ICLR 2025] Automated Design of Agentic Systems☆1,190Updated 3 weeks ago
- Task-Aware Agent-driven Prompt Optimization Framework☆2,823Updated last month
- Desktop app for prototyping and debugging LangGraph applications locally.☆2,479Updated 3 weeks ago
- AIOS: AI Agent Operating System☆3,836Updated this week
- Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Ge…☆5,567Updated this week