symflower / eval-dev-quality
DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.
☆169Updated last week
Alternatives and similar repositories for eval-dev-quality
Users that are interested in eval-dev-quality are comparing it to the libraries listed below
Sorting:
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆182Updated this week
- LangEvals aggregates various language model evaluators into a single platform, providing a standard interface for a multitude of scores a…☆56Updated this week
- A user interface for DSPy☆146Updated 6 months ago
- Simple examples using Argilla tools to build AI☆52Updated 5 months ago
- II-Researcher: a new open-source framework designed to aid building search / research agents☆248Updated last week
- Scaling Data for SWE-agents☆160Updated this week
- Code for ScribeAgent paper☆57Updated 2 months ago
- ☆93Updated 8 months ago
- ☆114Updated 4 months ago
- Contains the prompts we use to talk to various LLMs for different utilities inside the editor☆76Updated last year
- ☆150Updated 2 months ago
- 🤖 Headless IDE for AI agents☆186Updated 3 weeks ago
- ☆65Updated 2 months ago
- Scripts to create your own moe models using mlx☆89Updated last year
- ☆101Updated 8 months ago
- CursorCore: Assist Programming through Aligning Anything☆123Updated 3 months ago
- ☆72Updated last week
- ☆155Updated 8 months ago
- Tutorial for building LLM router☆202Updated 9 months ago
- ☆130Updated 2 weeks ago
- Routing on Random Forest (RoRF)☆153Updated 7 months ago
- ☆85Updated 7 months ago
- Keeping my personal experiments separate from the main repo☆65Updated 3 months ago
- Distributed Inference for mlx LLm☆91Updated 9 months ago
- Letting Claude Code develop his own MCP tools :)☆100Updated 2 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆104Updated 5 months ago
- LLM reads a paper and produce a working prototype☆56Updated last month
- ☆138Updated last month
- Run AI generated code in isolated sandboxes☆71Updated 3 months ago
- A system that tries to resolve all issues on a github repo with OpenHands.☆108Updated 5 months ago