Aider-AI / polyglot-benchmarkLinks
Coding problems used in aider's polyglot benchmark
☆162Updated 7 months ago
Alternatives and similar repositories for polyglot-benchmark
Users that are interested in polyglot-benchmark are comparing it to the libraries listed below
Sorting:
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆273Updated last week
- Aider's refactoring benchmark exercises based on popular python repos☆76Updated 9 months ago
- Agent computer interface for AI software engineer.☆92Updated last week
- Scaling Data for SWE-agents☆328Updated last week
- Sidecar is the AI brains for the Aide editor and works alongside it, locally on your machine☆580Updated 2 months ago
- Contains the prompts we use to talk to various LLMs for different utilities inside the editor☆79Updated last year
- ☆99Updated 2 months ago
- A SQL-like language for efficient code analysis and transformations☆35Updated 6 months ago
- ☆138Updated 7 months ago
- Letting Claude Code develop his own MCP tools :)☆122Updated 4 months ago
- ☆331Updated last month
- A Text-Based Environment for Interactive Debugging☆250Updated this week
- Harness used to benchmark aider against SWE Bench benchmarks☆72Updated last year
- Open-source resources on agents for computer use.☆364Updated 6 months ago
- A system that tries to resolve all issues on a github repo with OpenHands.☆110Updated 8 months ago
- Claude Deep Research config for Claude Code.☆205Updated 4 months ago
- The Showdown Computer Control Evaluation Suite☆83Updated 4 months ago
- proof-of-concept of Cursor's Instant Apply feature☆83Updated 11 months ago
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆201Updated 3 weeks ago
- Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.☆188Updated 3 weeks ago
- ☆533Updated last month
- A framework for optimizing DSPy programs with RL☆96Updated this week
- 🤖 Headless IDE for AI agents☆196Updated 3 months ago
- DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.☆179Updated 2 months ago
- A better way of testing, inspecting, and analyzing AI Agent traces.☆39Updated 3 weeks ago
- Building open version of OpenAI o1 via reasoning traces (Groq, ollama, Anthropic, Gemini, OpenAI, Azure supported) Demo: https://hugging…☆183Updated 9 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆87Updated 10 months ago
- ☆315Updated 3 months ago
- ☆159Updated 11 months ago
- Plug-and-play tree search for agents☆259Updated last week