IBM / eval-assistLinks

EvalAssist is an open-source project that simplifies using large language models as evaluators (LLM-as-a-Judge) of the output of other large language models by supporting users in iteratively refining evaluation criteria in a web-based user experience.

☆92

Alternatives and similar repositories for eval-assist

Users that are interested in eval-assist are comparing it to the libraries listed below

Sorting:

alopatenko / LLMEvaluation
A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use…
☆154Updated this week
cvs-health / langfair
LangFair is a Python library for conducting use-case level LLM bias and fairness assessments
☆242Updated last week
ibm-granite-community / granite-snack-cookbook
Granite Snack Cookbook -- easily consumable recipes (python notebooks) that showcase the capabilities of the Granite models
☆317Updated this week
VectorInstitute / fed-rag
A framework for fine-tuning retrieval-augmented generation (RAG) systems.
☆136Updated this week
IBM / unitxt
🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data …
☆211Updated last week
spcl / knowledge-graph-of-thoughts
Official Implementation of "Affordable AI Assistants with Knowledge Graph of Thoughts"
☆195Updated last month
predlico / ARAGOG
ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…
☆114Updated last year
PrithivirajDamodaran / Route0x
Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da
☆117Updated 8 months ago
stephenleo / llm-structured-output-benchmarks
Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…
☆179Updated last year
SynaLinks / synalinks
🧠🔗 From idea to production in just few lines: Graph-Based Programmable Neuro-Symbolic LM Framework - a production-first LM framework bu…
☆359Updated 2 weeks ago
apple / ml-superposition-prompting
☆146Updated last year
Shekswess / synthgenai
SynthGenAI - Package for Generating Synthetic Datasets using LLMs.
☆50Updated this week
zetaalphavector / RAGElo
RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker
☆123Updated 3 weeks ago
phunterlau / code-in-blog
all code examples in the blog posts
☆21Updated 10 months ago
jina-ai / correlations
Simple UI for debugging correlations of text embeddings
☆301Updated 6 months ago
quotient-ai / judges
A small library of LLM judges
☆302Updated 3 months ago
pavanjava / bootstrap-rag
this project will bootstrap and scaffold the projects for specific semantic search and RAG applications along with regular boiler plate c…
☆92Updated 11 months ago
patrickfleith / datafast
Synthetic Text Dataset Generation for LLM projects
☆47Updated last week
deshwalmahesh / PHUDGE
Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…
☆51Updated last year
SylphAI-Inc / GithubChat
A practical RAG where you can download and chat with github repo
☆92Updated 9 months ago
qdrant / qdrant-rag-eval
This repo is the central repo for all the RAG Evaluation reference material and partner workshop
☆76Updated 7 months ago
rango-ramesh / advanced-chunker
Semantic Chunker is a lightweight Python package for semantically-aware chunking and clustering of text.
☆280Updated 7 months ago
aiverify-foundation / LLM-Evals-Catalogue
This repository stems from our paper, “Cataloguing LLM Evaluations”, and serves as a living, collaborative catalogue of LLM evaluation fr…
☆18Updated 2 years ago
rajshah4 / LLM-Evaluation
Sample notebooks and prompts for LLM evaluation
☆156Updated 3 weeks ago
jjovalle99 / raft-well-architected
☆20Updated last year
neuml / annotateai
📝 Automatically annotate papers using LLMs
☆361Updated 7 months ago
deepset-ai / haystack-experimental
🧪 Experimental features for Haystack
☆57Updated this week
Knowledgator / GLiClass
Generalist and Lightweight Model for Text Classification
☆165Updated 5 months ago
fastino-ai / GLiNER2
Unified Schema-Based Information Extraction
☆223Updated 3 weeks ago
ibm-granite / granite-guardian
The Granite Guardian models are designed to detect risks in prompts and responses.
☆121Updated last month