lmarena / copilot-arenaLinks
☆303Updated last month
Alternatives and similar repositories for copilot-arena
Users that are interested in copilot-arena are comparing it to the libraries listed below
Sorting:
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆214Updated this week
- Scaling Data for SWE-agents☆220Updated this week
- ☆120Updated 5 months ago
- Coding problems used in aider's polyglot benchmark☆131Updated 5 months ago
- Building open version of OpenAI o1 via reasoning traces (Groq, ollama, Anthropic, Gemini, OpenAI, Azure supported) Demo: https://hugging…☆180Updated 7 months ago
- A benchmark for emotional intelligence in large language models☆302Updated 10 months ago
- Together Open Deep Research☆308Updated last month
- ☆157Updated 9 months ago
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache☆105Updated last month
- Arena-Hard-Auto: An automatic LLM benchmark.☆841Updated last month
- Sidecar is the AI brains for the Aide editor and works alongside it, locally on your machine☆563Updated 3 weeks ago
- A system that tries to resolve all issues on a github repo with OpenHands.☆109Updated 6 months ago
- proof-of-concept of Cursor's Instant Apply feature☆81Updated 9 months ago
- Prompt-to-Leaderboard☆233Updated 3 weeks ago
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆443Updated 8 months ago
- GRadient-INformed MoE☆263Updated 8 months ago
- Benchmark that evaluates LLMs using 651 NYT Connections puzzles extended with extra trick words☆93Updated this week
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆477Updated 3 weeks ago
- Commit0: Library Generation from Scratch☆149Updated 3 weeks ago
- Open-source resources on agents for computer use.☆343Updated 4 months ago
- ☆157Updated 10 months ago
- Harness used to benchmark aider against SWE Bench benchmarks☆72Updated 11 months ago
- agent q - oss advanced reasoning and learning for autonomous ai agents☆464Updated 8 months ago
- multi1: create o1-like reasoning chains with multiple AI providers (and locally). Supports LiteLLM as backend too for 100+ providers at o…☆347Updated 4 months ago
- Benchmark environment for evaluating vision-language models (VLMs) on popular video games!☆262Updated last week
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆519Updated last week
- ☆437Updated 8 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 4 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆173Updated 2 months ago
- A desktop for AI agents☆160Updated last week