alopatenko / LLMEvaluationLinks

A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessment, and critically assess the effectiveness of these evaluation methods.

☆144

Alternatives and similar repositories for LLMEvaluation

Users that are interested in LLMEvaluation are comparing it to the libraries listed below

Sorting:

davanstrien / awesome-synthetic-datasets
awesome synthetic (text) datasets
☆302Updated 3 months ago
predlico / ARAGOG
ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…
☆114Updated last year
rajshah4 / LLM-Evaluation
Sample notebooks and prompts for LLM evaluation
☆151Updated last week
tcapelle / llm_recipes
A set of scripts and notebooks on LLM finetunning and dataset creation
☆110Updated last year
apple / ml-superposition-prompting
☆146Updated last year
MoritzLaurer / synthetic-data-blog
This is the reproduction repository for my 🤗 Hugging Face blog post on synthetic data
☆68Updated last year
quotient-ai / judges
A small library of LLM judges
☆294Updated 2 months ago
jongjyh / TrFr
Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning
☆46Updated last year
ayulockin / neurips-llm-efficiency-challenge
Starter pack for NeurIPS LLM Efficiency Challenge 2023.
☆126Updated 2 years ago
rungalileo / hallucination-index
Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.
☆115Updated 2 months ago
deep-diver / llamaduo
[ACL'25] Official Code for LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs
☆314Updated 3 months ago
MadryLab / context-cite
Attribute (or cite) statements generated by LLMs back to in-context information.
☆291Updated last year
stephenleo / llm-structured-output-benchmarks
Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…
☆179Updated last year
Knowledgator / GLiClass
Generalist and Lightweight Model for Text Classification
☆163Updated 4 months ago
salesforce / summary-of-a-haystack
Codebase accompanying the Summary of a Haystack paper.
☆79Updated last year
KarelDO / xmc.dspy
In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.
☆441Updated last year
zetaalphavector / RAGElo
RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker
☆120Updated 3 weeks ago
MoritzLaurer / zeroshot-classifier
Notebooks for training universal 0-shot classifiers on many different tasks
☆136Updated 9 months ago
cvs-health / langfair
LangFair is a Python library for conducting use-case level LLM bias and fairness assessments
☆236Updated last week
tonywu71 / colpali-cookbooks
Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻‍🍳
☆336Updated 4 months ago
illuin-tech / vidore-benchmark
Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.
☆244Updated 2 months ago
davanstrien / data-for-fine-tuning-llms
☆80Updated last year
deshwalmahesh / PHUDGE
Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…
☆50Updated last year
eugeneyan / llm-paper-notes
Notes from the Latent Space paper club. Follow along or start your own!
☆239Updated last year
lamini-ai / Lamini-Memory-Tuning
Banishing LLM Hallucinations Requires Rethinking Generalization
☆275Updated last year
microsoft / llm-data-creation
Model, Code & Data for the EMNLP'23 paper "Making Large Language Models Better Data Creators"
☆134Updated 2 years ago
patronus-ai / Lynx-hallucination-detection
☆43Updated last year
PrithivirajDamodaran / Route0x
Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da
☆116Updated 6 months ago
jina-ai / correlations
Simple UI for debugging correlations of text embeddings
☆296Updated 4 months ago
spcl / MRAG
Official Implementation of "Multi-Head RAG: Solving Multi-Aspect Problems with LLMs"
☆229Updated 3 weeks ago