Aider-AI / polyglot-benchmarkLinks
Coding problems used in aider's polyglot benchmark
☆174Updated 8 months ago
Alternatives and similar repositories for polyglot-benchmark
Users that are interested in polyglot-benchmark are comparing it to the libraries listed below
Sorting:
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆286Updated last week
- Aider's refactoring benchmark exercises based on popular python repos☆77Updated 10 months ago
- Agent computer interface for AI software engineer.☆104Updated this week
- Scaling Data for SWE-agents☆378Updated this week
- Harness used to benchmark aider against SWE Bench benchmarks☆71Updated last year
- Contains the prompts we use to talk to various LLMs for different utilities inside the editor☆80Updated last year
- DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.☆180Updated 3 months ago
- ☆161Updated 8 months ago
- proof-of-concept of Cursor's Instant Apply feature☆83Updated 11 months ago
- A system that tries to resolve all issues on a github repo with OpenHands.☆112Updated 9 months ago
- ☆108Updated 2 months ago
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆206Updated this week
- ☆336Updated 2 months ago
- ☆576Updated 2 weeks ago
- ☆159Updated last year
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆88Updated 10 months ago
- ☆41Updated 6 months ago
- 🤖 Headless IDE for AI agents☆200Updated 4 months ago
- Letting Claude Code develop his own MCP tools :)☆121Updated 5 months ago
- Claude Deep Research config for Claude Code.☆211Updated 5 months ago
- Sidecar is the AI brains for the Aide editor and works alongside it, locally on your machine☆584Updated 3 months ago
- ☆130Updated 5 months ago
- Distributed Inference for mlx LLm☆93Updated last year
- Train your own SOTA deductive reasoning model☆104Updated 5 months ago
- AI benchmark runtime framework that allows you to integrate and evaluate AI tasks using Docker-based benchmarks.☆157Updated 3 months ago
- ☆260Updated 2 months ago
- a Python library that uses Reinforcement Learning (RL) to train LLMs.☆39Updated 3 weeks ago
- Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.☆218Updated 2 weeks ago
- τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment☆219Updated last month
- A framework for optimizing DSPy programs with RL☆150Updated this week