GoogleCloudPlatform / evalbenchLinks
EvalBench is a flexible framework designed to measure the quality of generative AI (GenAI) workflows around database specific tasks.
☆24Updated this week
Alternatives and similar repositories for evalbench
Users that are interested in evalbench are comparing it to the libraries listed below
Sorting:
- DSPY on action with OpenSource LLMs.☆102Updated last year
- ☆127Updated last month
- Code and data for the paper "DBCᴏᴘɪʟᴏᴛ: Natural Language Querying over Massive Database via Schema Routing" (EDBT 2025)☆127Updated 3 months ago
- ☆44Updated 3 weeks ago
- Research repository on interfacing LLMs with Weaviate APIs. Inspired by the Berkeley Gorilla LLM.☆138Updated 3 months ago
- UNITE: A Unified Benchmark for Text-to-SQL Evaluation☆82Updated 6 months ago
- SUQL: Conversational Search over Structured and Unstructured Data with LLMs☆291Updated last month
- Data management with LLMs☆176Updated 10 months ago
- ☆146Updated last year
- A Lightweight Library for AI Observability☆252Updated 9 months ago
- Simple examples using Argilla tools to build AI☆56Updated last year
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆115Updated 8 months ago
- This repo is the central repo for all the RAG Evaluation reference material and partner workshop☆77Updated 7 months ago
- Official Repo for CRMArena and CRMArena-Pro☆126Updated last month
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆179Updated last year
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆114Updated last year
- This repository contains a pipeline for fine-tuning Large Language Models (LLMs) for Text-to-SQL conversion using General Reward Proximal…☆39Updated 7 months ago
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆124Updated last month
- Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, imp…☆199Updated last year
- The easiest and most comprehensive framework for building enterprise-grade NL2SQL solutions at scale.☆45Updated 11 months ago
- Query language for blending SQL and LLMs across structured + unstructured data, with type constraints.☆121Updated this week
- Source code of the paper: RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question Answering [F…☆69Updated last year
- Demo of knowledge graph creation and Graph RAG with BAML and Kuzu☆73Updated 2 months ago
- Automated knowledge graph creation SDK☆122Updated last year
- ☆228Updated last year
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…☆78Updated last year
- ☆75Updated last year
- Knowledge Graph Retrieval Augmented Generation (KG-RAG) Eval Datasets☆190Updated last year
- Simple UI for debugging correlations of text embeddings☆302Updated 6 months ago
- Explore the use of DSPy for extracting features from PDFs 🔎☆49Updated last year