Aider-AI / polyglot-benchmarkLinks
Coding problems used in aider's polyglot benchmark
☆184Updated 10 months ago
Alternatives and similar repositories for polyglot-benchmark
Users that are interested in polyglot-benchmark are comparing it to the libraries listed below
Sorting:
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆339Updated last week
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆432Updated last week
- Aider's refactoring benchmark exercises based on popular python repos☆77Updated last year
- Agent computer interface for AI software engineer.☆110Updated last month
- SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?☆202Updated this week
- ☆170Updated 10 months ago
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆218Updated last week
- ☆607Updated last month
- DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.☆182Updated 5 months ago
- ☆58Updated 9 months ago
- Harness used to benchmark aider against SWE Bench benchmarks☆76Updated last year
- ☆160Updated last year
- ☆120Updated 4 months ago
- A system that tries to resolve all issues on a github repo with OpenHands.☆114Updated 11 months ago
- ☆363Updated last month
- Verify Precision of all Kimi K2 API Vendor☆272Updated 2 weeks ago
- Letting Claude Code develop his own MCP tools :)☆123Updated 7 months ago
- Prompt-to-Leaderboard☆260Updated 5 months ago
- A framework for optimizing DSPy programs with RL☆208Updated this week
- Official repository for "NoLiMa: Long-Context Evaluation Beyond Literal Matching"☆161Updated 3 months ago
- Train your own SOTA deductive reasoning model☆108Updated 7 months ago
- Contains the prompts we use to talk to various LLMs for different utilities inside the editor☆83Updated last year
- Building open version of OpenAI o1 via reasoning traces (Groq, ollama, Anthropic, Gemini, OpenAI, Azure supported) Demo: https://hugging…☆184Updated last year
- Beating the GAIA benchmark with Transformers Agents. 🚀☆138Updated 8 months ago
- ☆121Updated 5 months ago
- proof-of-concept of Cursor's Instant Apply feature☆83Updated last year
- A benchmark for emotional intelligence in large language models☆369Updated last year
- Sidecar is the AI brains for the Aide editor and works alongside it, locally on your machine☆588Updated 5 months ago
- LLMProc: Unix-inspired runtime that treats LLMs as processes.☆33Updated 3 months ago
- Fast parallel LLM inference for MLX☆223Updated last year