symflower / eval-dev-quality
DevQualityEval: An evaluation benchmark π and framework to compare and evolve the quality of code generation of LLMs.
β137Updated 3 weeks ago
Related projects β
Alternatives and complementary repositories for eval-dev-quality
- Tutorial for building LLM routerβ163Updated 4 months ago
- WIP - Allows you to create DSPy pipelines using ComfyUIβ180Updated 3 months ago
- β104Updated 8 months ago
- π€ Headless IDE for AI agentsβ133Updated this week
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.β55Updated last week
- β150Updated 4 months ago
- β66Updated 2 months ago
- Just a bunch of benchmark logs for different LLMsβ115Updated 3 months ago
- β152Updated 2 months ago
- Contains the prompts we use to talk to various LLMs for different utilities inside the editorβ62Updated 9 months ago
- Simple examples using Argilla tools to build AIβ42Updated this week
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.β126Updated 6 months ago
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vectoβ¦β203Updated 6 months ago
- β72Updated last year
- Routing on Random Forest (RoRF)β84Updated last month
- Distributed Inference for mlx LLmβ70Updated 3 months ago
- A collection of prompts to challenge the reasoning abilities of large language models in presence of misguiding informationβ123Updated this week
- β112Updated this week
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAIβ221Updated 6 months ago
- Fast parallel LLM inference for MLXβ149Updated 4 months ago
- Function Calling Benchmark & Testingβ75Updated 4 months ago
- Enhancing AI Software Engineering with Repository-level Code Graphβ96Updated 2 months ago
- This repository includes the official implementation of OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs.β99Updated this week
- β106Updated 3 months ago
- β269Updated this week
- A new benchmark for measuring LLM's capability to detect bugs in large codebase.β27Updated 5 months ago
- β57Updated last week
- Let's create synthetic textbooks together :)β70Updated 9 months ago
- A prompting libraryβ128Updated last month
- β38Updated 8 months ago