symflower / eval-dev-quality
DevQualityEval: An evaluation benchmark π and framework to compare and evolve the quality of code generation of LLMs.
β143Updated this week
Alternatives and similar repositories for eval-dev-quality:
Users that are interested in eval-dev-quality are comparing it to the libraries listed below
- Just a bunch of benchmark logs for different LLMsβ119Updated 6 months ago
- Simple examples using Argilla tools to build AIβ53Updated 3 months ago
- Tutorial for building LLM routerβ182Updated 7 months ago
- Mixing Language Models with Self-Verification and Meta-Verificationβ101Updated 2 months ago
- β86Updated 5 months ago
- A better way of testing, inspecting, and analyzing AI Agent traces.β28Updated this week
- Contains the prompts we use to talk to various LLMs for different utilities inside the editorβ73Updated last year
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)β89Updated 3 weeks ago
- An automated tool for discovering insights from research papaer corporaβ136Updated 8 months ago
- β111Updated 2 months ago
- Client Code Examples, Use Cases and Benchmarks for Enterprise h2oGPTe RAG-Based GenAI Platformβ82Updated last week
- Routing on Random Forest (RoRF)β114Updated 4 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracyβ100Updated 10 months ago
- Function Calling Benchmark & Testingβ82Updated 7 months ago
- Aider's refactoring benchmark exercises based on popular python reposβ57Updated 4 months ago
- Leveraging DSPy for AI-driven task understanding and solution generation, the Self-Discover Framework automates problem-solving through rβ¦β58Updated 7 months ago
- β152Updated 7 months ago
- β65Updated 8 months ago
- Sphynx Hallucination Inductionβ52Updated 3 weeks ago
- Improve your questions! The AI for Inquiry - QuestionImprover Agent is an LLM-driven βtool for thoughtβ designed to enhance the depth andβ¦β141Updated this week
- π€ Headless IDE for AI agentsβ163Updated last week
- A Ruby on Rails style framework for the DSPy (Demonstrate, Search, Predict) project for Language Models like GPT, BERT, and LLama.β119Updated 4 months ago
- β48Updated last year
- Simple Graph Memory for AI applicationsβ81Updated 6 months ago
- GraphRAG database - hybrid graph / vector dbβ118Updated 5 months ago
- β61Updated 3 months ago
- Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)β101Updated last week
- A system that tries to resolve all issues on a github repo with OpenHands.β100Updated 3 months ago
- β74Updated last year