aiverify-foundation / LLM-Evals-CatalogueLinks

This repository stems from our paper, “Cataloguing LLM Evaluations”, and serves as a living, collaborative catalogue of LLM evaluation frameworks, benchmarks and papers.

☆18

Alternatives and similar repositories for LLM-Evals-Catalogue

Users that are interested in LLM-Evals-Catalogue are comparing it to the libraries listed below

Sorting:

rajshah4 / LLM-Evaluation
Sample notebooks and prompts for LLM evaluation
☆151Updated 2 weeks ago
hwchase17 / langfuzz
☆73Updated last year
cvs-health / langfair
LangFair is a Python library for conducting use-case level LLM bias and fairness assessments
☆239Updated this week
alopatenko / LLMEvaluation
A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use…
☆144Updated 2 weeks ago
rungalileo / hallucination-index
Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.
☆115Updated 3 months ago
jjovalle99 / raft-well-architected
☆20Updated last year
apple / ml-superposition-prompting
☆146Updated last year
IBM / eval-assist
EvalAssist is an open-source project that simplifies using large language models as evaluators (LLM-as-a-Judge) of the output of other la…
☆90Updated last week
AI-Maker-Space / Awesome-AIM-Index
An index of all of our weekly concepts + code events for aspiring AI Engineers and Business Leaders!!
☆87Updated this week
predlico / ARAGOG
ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…
☆114Updated last year
marib00 / llamaindex-embedding-lora
☆30Updated last year
jjovalle99 / DSPy-Text2SQL
DSPY on action with OpenSource LLMs.
☆97Updated last year
deshwalmahesh / PHUDGE
Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…
☆50Updated last year
CYQIQ / MultiCoT
Repository to demonstrate Chain of Table reasoning with multiple tables powered by LangGraph
☆146Updated last year
qdrant / qdrant-rag-eval
This repo is the central repo for all the RAG Evaluation reference material and partner workshop
☆76Updated 6 months ago
VectorInstitute / fed-rag
A framework for fine-tuning retrieval-augmented generation (RAG) systems.
☆132Updated this week
MoritzLaurer / synthetic-data-blog
This is the reproduction repository for my 🤗 Hugging Face blog post on synthetic data
☆68Updated last year
weaviate / gorilla
Research repository on interfacing LLMs with Weaviate APIs. Inspired by the Berkeley Gorilla LLM.
☆136Updated 2 months ago
stephenleo / llm-structured-output-benchmarks
Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…
☆179Updated last year
anakin87 / mistral-haystack
Mistral + Haystack: build RAG pipelines that rock 🤘
☆106Updated last year
statice / awesome-synthetic-data
A curated list of awesome synthetic data tools (open source and commercial).
☆217Updated last year
lz-chen / research-agent
☆89Updated 5 months ago
akashmathur-2212 / LLMs-playground
What, Why and How of LLMs.
☆75Updated last month
saharmor / voice-lab
Testing and evaluation framework for voice agents
☆152Updated 4 months ago
argilla-io / argilla-cookbook
Simple examples using Argilla tools to build AI
☆56Updated 11 months ago
deepset-ai / rag-with-nvidia-nims
🚀 Use NVIDIA NIMs with Haystack pipelines
☆31Updated last year
AI-ANK / RAGArch
RAGArch is a Streamlit-based application that empowers users to experiment with various components and parameters of Retrieval-Augmented …
☆87Updated last year
misbahsy / RAGTune
Tuning and Evaluation of RAG pipeline. (Automated optimization to be added soon)
☆262Updated last year
timlrx / simple-ai-agents
Simple AI agents / assistants
☆49Updated last year
pavanjava / bootstrap-rag
this project will bootstrap and scaffold the projects for specific semantic search and RAG applications along with regular boiler plate c…
☆92Updated 10 months ago