princeton-nlp / SWE-bench

[ICLR 2024] SWE-bench: Can Language Models Resolve Real-world Github Issues?

☆2,038

Alternatives and similar repositories for SWE-bench:

Users that are interested in SWE-bench are comparing it to the libraries listed below

nus-apr / auto-code-rover
A project structure aware autonomous software engineer aiming for autonomous program improvement. Resolved 37.3% tasks (pass@1) in SWE-be…
☆2,748Updated this week
SqueezeAILab / LLMCompiler
[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
☆1,532Updated 4 months ago
Codium-ai / AlphaCodium
Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""
☆3,662Updated last week
openai / simple-evals
☆1,998Updated last week
agiresearch / AIOS
AIOS: AI Agent Operating System
☆3,457Updated this week
xlang-ai / OSWorld
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
☆1,418Updated this week
OpenAutoCoder / Agentless
Agentless🐱: an agentless approach to automatically solve software development problems
☆749Updated 3 weeks ago
microsoft / LLMLingua
[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…
☆4,686Updated 2 weeks ago
togethercomputer / MoA
Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models
☆2,607Updated last month
ise-uiuc / magicoder
[ICML'24] Magicoder: Empowering Code Generation with OSS-Instruct
☆1,981Updated last month
microsoft / aici
AICI: Prompts as (Wasm) Programs
☆1,957Updated 3 weeks ago
lm-sys / RouteLLM
A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!
☆3,303Updated 3 months ago
xingyaoww / code-act
Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhan…
☆507Updated 6 months ago
McGill-NLP / webllama
Llama-3 agents that can browse the web by following instructions and talking to you
☆1,356Updated 4 months ago
arcee-ai / mergekit
Tools for merging pretrained large language models.
☆4,881Updated this week
evalplus / evalplus
Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024
☆1,268Updated 2 weeks ago
openai / human-eval
Code for the paper "Evaluating Large Language Models Trained on Code"
☆2,432Updated 9 months ago
princeton-nlp / SWE-agent
[NeurIPS 2024] SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It can also be employ…
☆13,789Updated last week
gkamradt / LLMTest_NeedleInAHaystack
Doing simple retrieval from LLM models at various context lengths to measure accuracy
☆1,592Updated 3 months ago
OpenCodeInterpreter / OpenCodeInterpreter
OpenCodeInterpreter is a suite of open-source code generation systems aimed at bridging the gap between large language models and sophist…
☆1,601Updated 6 months ago
huggingface / datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆2,071Updated this week
microsoft / promptbench
A unified evaluation framework for large language models
☆2,478Updated last month
myshell-ai / JetMoE
Reaching LLaMA2 Performance with 0.1M Dollars
☆961Updated 4 months ago
openai / transformer-debugger
☆4,038Updated 5 months ago
cohere-ai / cohere-toolkit
Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.
☆2,856Updated this week
sgl-project / sglang
SGLang is a fast serving framework for large language models and vision language models.
☆6,291Updated this week
allenai / open-instruct
☆1,916Updated this week
lucidrains / self-rewarding-lm-pytorch
Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI
☆1,340Updated 7 months ago
OS-Copilot / OS-Copilot
An self-improving embodied conversational agent seamlessly integrated into the operating system to automate our daily tasks.
☆1,532Updated 2 months ago