lmarena / copilot-arena
☆285Updated 3 weeks ago
Alternatives and similar repositories for copilot-arena:
Users that are interested in copilot-arena are comparing it to the libraries listed below
- II-Researcher: a new open-source framework designed to aid building search / research agents☆107Updated this week
- Sidecar is the AI brains for the Aide editor and works alongside it, locally on your machine☆534Updated last week
- Code release for Best-of-N Jailbreaking☆459Updated last month
- Finetune Llama-3-8b on the MathInstruct dataset☆108Updated 5 months ago
- ☆438Updated 5 months ago
- Contains the prompts we use to talk to various LLMs for different utilities inside the editor☆76Updated last year
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym☆410Updated 3 weeks ago
- ☆375Updated 2 months ago
- Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆477Updated 2 weeks ago
- DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.☆157Updated this week
- Coding problems used in aider's polyglot benchmark☆84Updated 3 months ago
- MCP Server to run python code locally☆49Updated 3 months ago
- ☆184Updated 4 months ago
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆136Updated last week
- Letting Claude Code develop his own MCP tools :)☆91Updated 3 weeks ago
- A desktop for AI agents☆121Updated last week
- Commit0: Library Generation from Scratch☆140Updated this week
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆157Updated this week
- 🤖 Headless IDE for AI agents☆176Updated last month
- agent q - oss advanced reasoning and learning for autonomous ai agents☆408Updated 6 months ago
- LLM based agents with proactive interactions, long-term memory, external tool integration, and local deployment capabilities.☆99Updated last week
- Prompt-to-Leaderboard☆205Updated 2 weeks ago
- Solving data for LLMs - Create quality synthetic datasets!☆145Updated 2 months ago
- Enhancing AI Software Engineering with Repository-level Code Graph☆151Updated this week
- ⚖️ Awesome LLM Judges ⚖️☆87Updated last month
- ☆61Updated last month
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆92Updated 5 months ago
- A user interface for DSPy☆142Updated 5 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆168Updated 2 months ago
- ☆144Updated 3 weeks ago