swe-bench / SWE-bench
SWE-bench [Multimodal]: Can Language Models Resolve Real-world Github Issues?
☆2,892Updated this week
Alternatives and similar repositories for SWE-bench:
Users that are interested in SWE-bench are comparing it to the libraries listed below
- A project structure aware autonomous software engineer aiming for autonomous program improvement. Resolved 37.3% tasks (pass@1) in SWE-be…☆2,928Updated 2 weeks ago
- Agentless🐱: an agentless approach to automatically solve software development problems☆1,656Updated 4 months ago
- ☆2,780Updated 2 weeks ago
- AIOS: AI Agent Operating System☆4,114Updated last week
- Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhan…☆1,126Updated 11 months ago
- Modeling, training, eval, and inference code for OLMo☆5,560Updated last week
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆5,048Updated last month
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆1,843Updated 8 months ago
- [ICLR 2025] Automated Design of Agentic Systems☆1,283Updated 3 months ago
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality☆3,884Updated 8 months ago
- [ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling☆1,676Updated 9 months ago
- Code for the paper "Evaluating Large Language Models Trained on Code"☆2,719Updated 3 months ago
- SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersec…☆15,668Updated this week
- Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models☆2,741Updated 4 months ago
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)☆2,529Updated 3 months ago
- LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step☆525Updated 7 months ago
- [NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments☆1,822Updated last week
- Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""☆3,824Updated 5 months ago
- AllenAI's post-training codebase☆2,942Updated this week
- TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.☆2,505Updated last month
- A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.☆12,060Updated 3 weeks ago
- g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains☆4,213Updated 3 months ago
- A library for advanced large language model reasoning☆2,116Updated 3 weeks ago
- This repo contains the dataset and code for the paper "SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software E…☆1,359Updated last month
- An Open Large Reasoning Model for Real-World Solutions☆1,488Updated 2 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,671Updated last week
- PyTorch native post-training library☆5,154Updated this week
- Tools for merging pretrained large language models.☆5,628Updated this week
- verl: Volcano Engine Reinforcement Learning for LLMs☆7,626Updated this week
- A curated list of awesome LLM agents frameworks.☆913Updated this week