LiveBench / liveswebench
☆31Updated last month
Alternatives and similar repositories for liveswebench
Users that are interested in liveswebench are comparing it to the libraries listed below
Sorting:
- Scaling Data for SWE-agents☆160Updated this week
- ☆87Updated last week
- ☆40Updated 9 months ago
- RepoQA: Evaluating Long-Context Code Understanding☆108Updated 6 months ago
- Enhancing AI Software Engineering with Repository-level Code Graph☆164Updated last month
- CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings☆36Updated 3 months ago
- Coding problems used in aider's polyglot benchmark☆115Updated 4 months ago
- ☆80Updated last month
- Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving☆157Updated this week
- ☆155Updated 8 months ago
- ☆92Updated 10 months ago
- Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents☆67Updated 3 weeks ago
- Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement☆88Updated 3 months ago
- Code for ScribeAgent paper☆57Updated 2 months ago
- ☆114Updated 2 months ago
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆182Updated this week
- SWE Arena☆33Updated last month
- Async pipelined version of Verl☆78Updated last month
- ☆45Updated last year
- Beating the GAIA benchmark with Transformers Agents. 🚀☆114Updated 2 months ago
- Simple high-throughput inference library☆46Updated this week
- General Reasoner: Advancing LLM Reasoning Across All Domains☆82Updated last week
- Official implementation of paper How to Understand Whole Repository? New SOTA on SWE-bench Lite (21.3%)☆85Updated last month
- NaturalCodeBench (Findings of ACL 2024)☆64Updated 7 months ago
- Code for the curation of The Stack v2 and StarCoder2 training data☆105Updated last year
- Benchmarking LLMs with Challenging Tasks from Real Users☆222Updated 6 months ago
- A benchmark that challenges language models to code solutions for scientific problems☆119Updated this week
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆193Updated last week
- Data preparation code for CrystalCoder 7B LLM☆44Updated last year
- Computer Agent Arena: Test & compare AI agents in real desktop apps & web environments. Code/data coming soon!☆45Updated last month