alopatenko / LLMEvaluationLinks
A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessment, and critically assess the effectiveness of these evaluation methods.
☆174Updated last week
Alternatives and similar repositories for LLMEvaluation
Users that are interested in LLMEvaluation are comparing it to the libraries listed below
Sorting:
- Sample notebooks and prompts for LLM evaluation☆159Updated 3 months ago
- ☆147Updated last year
- awesome synthetic (text) datasets☆321Updated 3 weeks ago
- A small library of LLM judges☆319Updated 6 months ago
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆113Updated last year
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆126Updated 3 months ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆184Updated last year
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.☆116Updated 6 months ago
- A set of scripts and notebooks on LLM finetunning and dataset creation☆115Updated last year
- Starter pack for NeurIPS LLM Efficiency Challenge 2023.☆129Updated 2 years ago
- Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning☆46Updated 2 years ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆317Updated last year
- Model, Code & Data for the EMNLP'23 paper "Making Large Language Models Better Data Creators"☆137Updated 2 years ago
- LangFair is a Python library for conducting use-case level LLM bias and fairness assessments☆251Updated 3 weeks ago
- Codebase accompanying the Summary of a Haystack paper.☆80Updated last year
- Ranking LLMs on agentic tasks☆210Updated 2 months ago
- Banishing LLM Hallucinations Requires Rethinking Generalization☆277Updated last year
- ☆43Updated last year
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆446Updated last year
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆51Updated last year
- LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR t…☆518Updated 11 months ago
- Benchmarking library for RAG☆255Updated last week
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da☆119Updated 10 months ago
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻🍳