lmarena / copilot-arena
☆292Updated last month
Alternatives and similar repositories for copilot-arena:
Users that are interested in copilot-arena are comparing it to the libraries listed below
- A desktop for AI agents☆138Updated this week
- Prompt-to-Leaderboard☆218Updated 2 weeks ago
- Coding problems used in aider's polyglot benchmark☆108Updated 4 months ago
- ☆436Updated 6 months ago
- Sidecar is the AI brains for the Aide editor and works alongside it, locally on your machine☆543Updated 2 weeks ago
- II-Researcher: a new open-source framework designed to aid building search / research agents☆240Updated this week
- Open-source resources on agents for computer use.☆321Updated 3 months ago
- Code release for Best-of-N Jailbreaking☆486Updated 2 months ago
- agent q - oss advanced reasoning and learning for autonomous ai agents☆433Updated 7 months ago
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆162Updated last week
- Big & Small LLMs working together☆717Updated this week
- ☆106Updated 4 months ago
- ☆257Updated 4 months ago
- ☆184Updated 5 months ago
- Commit0: Library Generation from Scratch☆144Updated 2 weeks ago
- Finetune Llama-3-8b on the MathInstruct dataset☆110Updated 6 months ago
- Arena-Hard-Auto: An automatic LLM benchmark.☆782Updated this week
- Hallucination Detector is a free and open-source tool that helps you verify the accuracy of your LLM generated content instantly.☆199Updated 3 months ago
- the simplest self-building general autonomous agent☆305Updated 6 months ago
- Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆509Updated last month
- Together Open Deep Research☆242Updated last week
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆432Updated 7 months ago
- An agent benchmark with tasks in a simulated software company.☆296Updated 3 weeks ago
- Letting Claude Code develop his own MCP tools :)☆99Updated last month
- LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step☆521Updated 7 months ago
- A system that tries to resolve all issues on a github repo with OpenHands.☆106Updated 5 months ago
- Make any LLM to think like OpenAI o1 and deepseek R1☆485Updated 2 months ago
- proof-of-concept of Cursor's Instant Apply feature☆78Updated 7 months ago
- Building open version of OpenAI o1 via reasoning traces (Groq, ollama, Anthropic, Gemini, OpenAI, Azure supported) Demo: https://hugging…☆178Updated 6 months ago
- DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.☆167Updated this week