alopatenko / LLMEvaluationLinks
A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessment, and critically assess the effectiveness of these evaluation methods.
☆118Updated 3 weeks ago
Alternatives and similar repositories for LLMEvaluation
Users that are interested in LLMEvaluation are comparing it to the libraries listed below
Sorting:
- Sample notebooks and prompts for LLM evaluation☆128Updated last week
- A small library of LLM judges☆205Updated last week
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆104Updated last year
- awesome synthetic (text) datasets☆281Updated 7 months ago
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆111Updated last week
- This repo is the central repo for all the RAG Evaluation reference material and partner workshop☆64Updated last month
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.☆110Updated 8 months ago
- ☆143Updated 10 months ago
- ☆152Updated 6 months ago
- ☆77Updated 11 months ago
- A set of scripts and notebooks on LLM finetunning and dataset creation☆111Updated 8 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆99Updated last year
- This project showcases an LLMOps pipeline that fine-tunes a small-size LLM model to prepare for the outage of the service LLM.☆305Updated 2 months ago
- Late Interaction Models Training & Retrieval☆395Updated this week
- ☆77Updated last year
- Fiddler Auditor is a tool to evaluate language models.☆181Updated last year
- 🤗 Benchmark Large Language Models Reliably On Your Data☆318Updated this week
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da☆105Updated 2 months ago
- This is the reproduction repository for my 🤗 Hugging Face blog post on synthetic data☆68Updated last year
- Codebase accompanying the Summary of a Haystack paper.☆78Updated 8 months ago
- Simple examples using Argilla tools to build AI☆53Updated 6 months ago
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆150Updated this week
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆172Updated 8 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated 10 months ago
- Building a chatbot powered with a RAG pipeline to read,summarize and quote the most relevant papers related to the user query.☆167Updated last year
- Attribute (or cite) statements generated by LLMs back to in-context information.☆235Updated 7 months ago
- ☆52Updated last year
- Domain Adapted Language Modeling Toolkit - E2E RAG☆321Updated 6 months ago
- This is the repo for the LegalBench-RAG Paper: https://arxiv.org/abs/2408.10343.☆90Updated this week
- Fine-tune an LLM to perform batch inference and online serving.☆111Updated 3 weeks ago