alopatenko / LLMEvaluation
A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessment, and critically assess the effectiveness of these evaluation methods.
☆109Updated this week
Alternatives and similar repositories for LLMEvaluation:
Users that are interested in LLMEvaluation are comparing it to the libraries listed below
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆99Updated last year
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆101Updated 11 months ago
- Sample notebooks and prompts for LLM evaluation☆124Updated 4 months ago
- ☆143Updated 8 months ago
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆108Updated this week
- Building a chatbot powered with a RAG pipeline to read,summarize and quote the most relevant papers related to the user query.☆166Updated 11 months ago
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da☆101Updated 2 weeks ago
- Starter pack for NeurIPS LLM Efficiency Challenge 2023.☆124Updated last year
- A set of scripts and notebooks on LLM finetunning and dataset creation☆106Updated 6 months ago
- Fiddler Auditor is a tool to evaluate language models.☆178Updated last year
- This project showcases an LLMOps pipeline that fine-tunes a small-size LLM model to prepare for the outage of the service LLM.☆303Updated last week
- awesome synthetic (text) datasets☆267Updated 5 months ago
- ☆78Updated 10 months ago
- Late Interaction Models Training & Retrieval☆270Updated this week
- ☆77Updated 10 months ago
- A small library of LLM judges☆169Updated last week
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻🍳☆268Updated 3 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated 9 months ago
- Testing speed and accuracy of RAG with, and without Cross Encoder Reranker.☆48Updated last year
- This playlab encompasses a multitude of projects crafted through the utilization of Large Language Models, showcasing the versatility and…☆111Updated last week
- ☆194Updated 11 months ago
- Codebase accompanying the Summary of a Haystack paper.☆77Updated 6 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆226Updated 6 months ago
- Mistral + Haystack: build RAG pipelines that rock 🤘☆103Updated last year
- Domain Adapted Language Modeling Toolkit - E2E RAG☆319Updated 5 months ago
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆417Updated last year
- This repository stems from our paper, “Cataloguing LLM Evaluations”, and serves as a living, collaborative catalogue of LLM evaluation fr…☆17Updated last year
- Fine-tune an LLM to perform batch inference and online serving.☆107Updated this week
- Generalist and Lightweight Model for Text Classification☆115Updated this week
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.☆107Updated 7 months ago