DevQualityEval: An evaluation benchmark π and framework to compare and evolve the quality of code generation of LLMs.
β185May 15, 2025Updated last year
Alternatives and similar repositories for eval-dev-quality
Users that are interested in eval-dev-quality are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Whether youβre using LLMs or not, Symflower helps you build better software by pairing static, dynamic and symbolic analyses with LLMs. Tβ¦β24Jun 30, 2025Updated 11 months ago
- Unit test generation for the Kakoune editor with Symflowerβ12Oct 17, 2022Updated 3 years ago
- Additional functionality for Go's os packageβ17Mar 16, 2025Updated last year
- A daily benchmark to regression-test cloud LLMsβ19Aug 7, 2025Updated 9 months ago
- Paper-reading notes for Berkeley OS prelim exam.β14Aug 28, 2024Updated last year
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- alternative way to calculating self attentionβ18May 25, 2024Updated 2 years ago
- Yet another frontend for LLM, written using .NET and WinUI 3β11Sep 14, 2025Updated 8 months ago
- Advanced Reasoning Benchmark Dataset for LLMsβ47Nov 19, 2023Updated 2 years ago
- Evals for agentsβ15Dec 4, 2024Updated last year
- ReasonFlow is a novel framework designed to implement o1-like reasoning capabilities in large language models.β19Feb 25, 2025Updated last year
- Genetics for Language Modelsβ17Jul 1, 2024Updated last year
- Bayesian scaling laws for in-context learning.β15Mar 12, 2025Updated last year
- Set of scripts to finetune LLMsβ38Mar 30, 2024Updated 2 years ago
- A Flutter plugin for integrating Liquid AI's LEAP SDK, enabling on-device deployment of small language models in Flutter applications.β23Sep 3, 2025Updated 8 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- β18Apr 15, 2024Updated 2 years ago
- A personalised language learning appβ18Oct 9, 2024Updated last year
- An unofficial implementation of SOLAR-10.7B model and the newly proposed interlocked-DUS(iDUS) implementation and experiment details.β14Mar 20, 2024Updated 2 years ago
- MCP server for Youtubeβ19Mar 15, 2025Updated last year
- β74Sep 5, 2023Updated 2 years ago
- Project code for training LLMs to write better unit tests + codeβ22May 19, 2025Updated last year
- Small, simple agent task environments for training and evaluationβ19Nov 1, 2024Updated last year
- LLM benchmarksβ13Feb 22, 2024Updated 2 years ago
- The official evaluation suite and dynamic data release for MixEval.β255Nov 10, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A collection of libraries in Schemeβ13Mar 2, 2021Updated 5 years ago
- Cursor system prompt repositoryβ28Nov 25, 2024Updated last year
- Simple Model Similarities Analysisβ21Feb 3, 2024Updated 2 years ago
- Repo for "AlphaResearch: Accelerating New Algorithm Discovery with Language Models"β56Nov 12, 2025Updated 6 months ago
- A Modified Validator for the Diet Clientβ11Jun 14, 2024Updated last year
- β17May 8, 2024Updated 2 years ago
- β11Jan 9, 2025Updated last year
- A sample pattern for running CI tests on Modalβ19Apr 12, 2025Updated last year
- β28Apr 2, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generationβ50Dec 22, 2023Updated 2 years ago
- Benchmark that evaluates LLMs using 759 NYT Connections puzzles extended with extra trick wordsβ226Updated this week
- Just a bunch of benchmark logs for different LLMsβ127Jul 28, 2024Updated last year
- NaturalCodeBench (Findings of ACL 2024)β70Oct 14, 2024Updated last year
- β12Nov 5, 2024Updated last year
- Self-evaluating interview for AI codersβ600Jun 21, 2025Updated 11 months ago
- β28Nov 10, 2025Updated 6 months ago