DevQualityEval: An evaluation benchmark π and framework to compare and evolve the quality of code generation of LLMs.
β185May 15, 2025Updated 9 months ago
Alternatives and similar repositories for eval-dev-quality
Users that are interested in eval-dev-quality are comparing it to the libraries listed below
Sorting:
- Whether youβre using LLMs or not, Symflower helps you build better software by pairing static, dynamic and symbolic analyses with LLMs. Tβ¦β23Jun 30, 2025Updated 8 months ago
- Unit test generation for the Kakoune editor with Symflowerβ12Oct 17, 2022Updated 3 years ago
- alternative way to calculating self attentionβ18May 25, 2024Updated last year
- β18Apr 15, 2024Updated last year
- β28Nov 10, 2025Updated 3 months ago
- A constraint programming solver.β10Jan 10, 2024Updated 2 years ago
- A collection of libraries in Schemeβ13Mar 2, 2021Updated 5 years ago
- Evals for agentsβ14Dec 4, 2024Updated last year
- Bayesian scaling laws for in-context learning.β15Mar 12, 2025Updated 11 months ago
- β12May 20, 2025Updated 9 months ago
- Yet another frontend for LLM, written using .NET and WinUI 3β10Sep 14, 2025Updated 5 months ago
- β12Nov 5, 2024Updated last year
- A daily benchmark to regression-test cloud LLMsβ17Aug 7, 2025Updated 6 months ago
- LLM benchmarksβ13Feb 22, 2024Updated 2 years ago
- A Flutter plugin for integrating Liquid AI's LEAP SDK, enabling on-device deployment of small language models in Flutter applications.β23Sep 3, 2025Updated 6 months ago
- A Modified Validator for the Diet Clientβ11Jun 14, 2024Updated last year
- A full stack typescript SAAS boilerplate with Chat, Auth (Langgraph, supabase), Payments (stripe), and AI Creditsβ17May 23, 2025Updated 9 months ago
- SK Multi agentic advanced orchestration exampleβ15Feb 20, 2026Updated 2 weeks ago
- β11Jan 9, 2025Updated last year
- The official evaluation suite and dynamic data release for MixEval.β255Nov 10, 2024Updated last year
- Testing LLM reasoning abilities with lineage relationship quizzes.β36Feb 2, 2026Updated last month
- Creates an Azure AI Service and deploys the specified models.β18Aug 22, 2025Updated 6 months ago
- Just a bunch of benchmark logs for different LLMsβ119Jul 28, 2024Updated last year
- Model Server Template. Used to expose custom models to the LangSmith Playgroundβ17Jun 14, 2024Updated last year
- Public code repo for EMNLP 2024 Findings paper "MACAROON: Training Vision-Language Models To Be Your Engaged Partners"β14Sep 28, 2024Updated last year
- Rust Back testing framework for Databentoβ22Jan 13, 2026Updated last month
- An unofficial implementation of SOLAR-10.7B model and the newly proposed interlocked-DUS(iDUS) implementation and experiment details.β14Mar 20, 2024Updated last year
- ReasonFlow is a novel framework designed to implement o1-like reasoning capabilities in large language models.β19Feb 25, 2025Updated last year
- Run SWE-bench evaluations remotelyβ58Aug 14, 2025Updated 6 months ago
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.β251Feb 27, 2026Updated last week
- a set of scripts to easily convert all training data from huggingface into alpaca instruct or sharegpt format, which should allow for easβ¦β18Mar 14, 2025Updated 11 months ago
- Genetics for Language Modelsβ17Jul 1, 2024Updated last year
- Benchmark that evaluates LLMs using 759 NYT Connections puzzles extended with extra trick wordsβ198Updated this week
- A sample pattern for running CI tests on Modalβ19Apr 12, 2025Updated 10 months ago
- β104Jul 17, 2024Updated last year
- Set of scripts to finetune LLMsβ38Mar 30, 2024Updated last year
- Almost backwards compatible alternative to Clojure 1.8.0 implementation of multimethods with roughly 1/10 the method lookup cost.β14Aug 1, 2023Updated 2 years ago
- NaturalCodeBench (Findings of ACL 2024)β68Oct 14, 2024Updated last year
- Pi skills to replicate AmpCode experience - handoffs, modes, permissions, web accessβ118Feb 27, 2026Updated last week