symflower / eval-dev-quality
DevQualityEval: An evaluation benchmark π and framework to compare and evolve the quality of code generation of LLMs.
β133Updated 2 weeks ago
Related projects β
Alternatives and complementary repositories for eval-dev-quality
- π€ Headless IDE for AI agentsβ129Updated this week
- Routing on Random Forest (RoRF)β83Updated last month
- Just a bunch of benchmark logs for different LLMsβ114Updated 3 months ago
- Simple examples using Argilla tools to build AIβ38Updated this week
- WIP - Allows you to create DSPy pipelines using ComfyUIβ179Updated 3 months ago
- Fast parallel LLM inference for MLXβ146Updated 4 months ago
- Mixing Language Models with Self-Verification and Meta-Verificationβ97Updated last year
- look how they massacred my boyβ54Updated 3 weeks ago
- β103Updated 7 months ago
- β72Updated last year
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.β52Updated last week
- β110Updated 2 weeks ago
- Tutorial for building LLM routerβ157Updated 3 months ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)β62Updated this week
- A toolkit for building multimodal AI agentsβ107Updated 2 weeks ago
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vectoβ¦β201Updated 5 months ago
- Distributed Inference for mlx LLmβ68Updated 3 months ago
- GRDN.AI app for garden optimizationβ69Updated 9 months ago
- β38Updated 7 months ago
- Official homepage for "Self-Harmonized Chain of Thought"β83Updated last month
- GPT-4 Level Conversational QA Trained In a Few Hoursβ55Updated 2 months ago
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAIβ222Updated 6 months ago
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafteβ¦β52Updated last week
- Function Calling Benchmark & Testingβ74Updated 4 months ago
- A simple Python sandbox for helpful LLM data agentsβ162Updated 4 months ago
- Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)β74Updated last month
- Solving data for LLMs - Create quality synthetic datasets!β136Updated 3 weeks ago
- β93Updated 2 months ago
- β49Updated 2 weeks ago
- an implementation of Self-Extend, to expand the context window via grouped attentionβ118Updated 10 months ago
- A new benchmark for measuring LLM's capability to detect bugs in large codebase.β27Updated 5 months ago