lmarena / copilot-arenaLinks

☆315

Alternatives and similar repositories for copilot-arena

Users that are interested in copilot-arena are comparing it to the libraries listed below

Sorting:

harishsg993010 / LLM-Research-Scripts
☆433Updated 10 months ago
Aider-AI / polyglot-benchmark
Coding problems used in aider's polyglot benchmark
☆167Updated 7 months ago
All-Hands-AI / openhands-aci
Agent computer interface for AI software engineer.
☆97Updated this week
simple-bench / SimpleBench
☆146Updated 7 months ago
SWE-bench / SWE-smith
Scaling Data for SWE-agents
☆342Updated this week
lmarena / p2l
Prompt-to-Leaderboard
☆244Updated 3 months ago
pseudotensor / open-strawberry
Building open version of OpenAI o1 via reasoning traces (Groq, ollama, Anthropic, Gemini, OpenAI, Azure supported) Demo: https://hugging…
☆183Updated 9 months ago
SWE-agent / SWE-ReX
Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.
☆278Updated this week
All-Hands-AI / openhands-resolver
A system that tries to resolve all issues on a github repo with OpenHands.
☆111Updated 8 months ago
llllvvuu / instant_apply
proof-of-concept of Cursor's Instant Apply feature
☆83Updated 11 months ago
togethercomputer / finetuning
Finetune Llama-3-8b on the MathInstruct dataset
☆111Updated 9 months ago
symflower / eval-dev-quality
DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.
☆179Updated 2 months ago
vaibhavagg303 / DARS-Agent
☆63Updated 2 months ago
togethercomputer / open_deep_research
Together Open Deep Research
☆331Updated 3 months ago
arcprize / arc-agi-benchmarking
Testing baseline LLMs performance across various models
☆293Updated 2 weeks ago
codestoryai / prompts
Contains the prompts we use to talk to various LLMs for different utilities inside the editor
☆80Updated last year
LLMSELECTOR / LLMSELECTOR
☆73Updated 5 months ago
QuixiAI / OpenChatML
☆157Updated last year
microsoft / debug-gym
A Text-Based Environment for Interactive Debugging
☆251Updated this week
qunash / r1-overthinker
Force DeepSeek r1 models to think for as long as you wish
☆368Updated 5 months ago
mshumer / OpenReasoningEngine
☆187Updated 8 months ago
microsoft / GRIN-MoE
GRadient-INformed MoE
☆264Updated 10 months ago
agora-protocol / paper-demo
☆166Updated 5 months ago
codestoryai / sidecar
Sidecar is the AI brains for the Aide editor and works alongside it, locally on your machine
☆580Updated 2 months ago
METR / eval-analysis-public
Public repository containing METR's DVC pipeline for eval data analysis
☆91Updated 4 months ago
hide-org / hide
🤖 Headless IDE for AI agents
☆196Updated 3 months ago
jd-3d / SOLOBench
☆133Updated 3 months ago
JonathanChavezTamales / llm-leaderboard
A comprehensive set of LLM benchmark scores and provider prices.
☆288Updated this week
tcsenpai / multi1
multi1: create o1-like reasoning chains with multiple AI providers (and locally). Supports LiteLLM as backend too for 100+ providers at o…
☆348Updated 6 months ago
aorwall / moatless-tools
☆533Updated last month