GoogleCloudPlatform / evalbench
EvalBench is a flexible framework designed to measure the quality of generative AI (GenAI) workflows around database specific tasks.
☆17Updated this week
Alternatives and similar repositories for evalbench:
Users that are interested in evalbench are comparing it to the libraries listed below
- ☆45Updated 5 months ago
- ☆74Updated 6 months ago
- Query language for blending SQL logic and LLM reasoning across structured + unstructured data. [Findings of ACL 2024]☆95Updated this week
- DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆50Updated 2 months ago
- UNITE: A Unified Benchmark for Text-to-SQL Evaluation☆72Updated 11 months ago
- Efficient BM25 indexing using rust☆16Updated 7 months ago
- Introduction page of a challenging text-to-SQL dataset: KaggleDBQA☆36Updated last year
- [COLING'25] Gen-SQL: Efficient Text-to-SQL By Bridging Natural Language Question And Database Schema With Pseudo-Schema☆10Updated 3 months ago
- A structured framework for defining, verifying and certifying AI systems.☆11Updated last month
- Leverage your LangChain trace data for fine tuning☆41Updated 8 months ago
- Chrome Extension for exploring Hugging Face datasets 🔎☆49Updated 7 months ago
- This is the reproduction repository for my 🤗 Hugging Face blog post on synthetic data☆68Updated last year
- Testing speed and accuracy of RAG with, and without Cross Encoder Reranker.☆48Updated last year
- Using Large Language Models (LLMs) to convert natural language queries to sql☆45Updated 6 months ago
- An application to write and run SQL queries, returning answers to natural language questions, using langchain and open source LLM models …☆33Updated last year
- Code and data for the paper "DBCᴏᴘɪʟᴏᴛ: Natural Language Querying over Massive Database via Schema Routing" (EDBT 2025)☆97Updated last month
- This repository is a combination of llama workflows and agents together which is a powerful concept.☆17Updated 8 months ago
- Evaluation of bm42 sparse indexing algorithm☆65Updated 9 months ago
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use…☆115Updated this week
- ☆19Updated 6 months ago
- MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents [EMNLP 2024]☆151Updated 3 months ago
- A Hands-on Practical Guide to LlamaIndex☆33Updated 6 months ago
- ☆62Updated 9 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated 9 months ago
- "Syntriever: How to Train Your Retriever with Synthetic Data from LLMs" the Nations of the Americas Chapter of the Association for Comput…☆25Updated last month
- ☆16Updated last year
- DSPY on action with OpenSource LLMs.☆70Updated last year
- Benchmark study on LanceDB, an embedded vector DB, for full-text search and vector search☆24Updated last year
- 💻 An open-source vibe-coding platform today. The next generation IDE tomorrow.☆19Updated this week
- This repository contains all the code for the DTS-SQL paper☆51Updated 8 months ago