lmarena / copilot-arenaLinks
☆312Updated 2 months ago
Alternatives and similar repositories for copilot-arena
Users that are interested in copilot-arena are comparing it to the libraries listed below
Sorting:
- Coding problems used in aider's polyglot benchmark☆155Updated 6 months ago
- Finetune Llama-3-8b on the MathInstruct dataset☆110Updated 9 months ago
- GRadient-INformed MoE☆263Updated 9 months ago
- ☆434Updated 9 months ago
- A system that tries to resolve all issues on a github repo with OpenHands.☆110Updated 8 months ago
- Together Open Deep Research☆320Updated 3 months ago
- ☆132Updated 6 months ago
- Prompt-to-Leaderboard☆241Updated 2 months ago
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆247Updated this week
- DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.☆178Updated 2 months ago
- Agent computer interface for AI software engineer.☆89Updated this week
- Scaling Data for SWE-agents☆293Updated this week
- Open-source resources on agents for computer use.☆357Updated 5 months ago
- Open Agent Computer Interface☆77Updated 7 months ago
- ☆183Updated 7 months ago
- ☆162Updated 4 months ago
- A benchmark for emotional intelligence in large language models☆323Updated 11 months ago
- A preprint version of our recent research on the capability of frontier AI systems to do self-replication☆60Updated 7 months ago
- An agent benchmark with tasks in a simulated software company.☆488Updated last week
- Testing baseline LLMs performance across various models☆284Updated this week
- ☆71Updated 4 months ago
- Tutorial for building LLM router☆217Updated last year
- ☆315Updated 7 months ago
- Building open version of OpenAI o1 via reasoning traces (Groq, ollama, Anthropic, Gemini, OpenAI, Azure supported) Demo: https://hugging…☆181Updated 9 months ago
- Force DeepSeek r1 models to think for as long as you wish☆369Updated 5 months ago
- A Text-Based Environment for Interactive Debugging☆236Updated this week
- An automated tool for discovering insights from research papaer corpora☆138Updated last year
- A user interface for DSPy☆162Updated last month
- ☆64Updated last month
- ☆62Updated 8 months ago