alopatenko / LLMEvaluation
A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessment, and critically assess the effectiveness of these evaluation methods.
☆115Updated last week
Alternatives and similar repositories for LLMEvaluation:
Users that are interested in LLMEvaluation are comparing it to the libraries listed below
- ☆143Updated 9 months ago
- A set of scripts and notebooks on LLM finetunning and dataset creation☆107Updated 7 months ago
- Sample notebooks and prompts for LLM evaluation☆124Updated 2 weeks ago
- ☆77Updated 11 months ago
- Fiddler Auditor is a tool to evaluate language models.☆179Updated last year
- awesome synthetic (text) datasets☆278Updated 6 months ago
- Starter pack for NeurIPS LLM Efficiency Challenge 2023.☆124Updated last year
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da☆102Updated last month
- This project showcases an LLMOps pipeline that fine-tunes a small-size LLM model to prepare for the outage of the service LLM.☆304Updated last month
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆99Updated last year
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻🍳☆277Updated last week
- Building a chatbot powered with a RAG pipeline to read,summarize and quote the most relevant papers related to the user query.☆166Updated last year
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆108Updated 3 weeks ago
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆102Updated last year
- LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR t…☆417Updated 2 months ago
- Codebase accompanying the Summary of a Haystack paper.☆77Updated 7 months ago
- ☆151Updated 5 months ago
- Fine-tune an LLM to perform batch inference and online serving.☆110Updated this week
- ☆36Updated 9 months ago
- experiments with inference on llama☆104Updated 10 months ago
- Notes from the Latent Space paper club. Follow along or start your own!☆230Updated 9 months ago
- Banishing LLM Hallucinations Requires Rethinking Generalization☆273Updated 9 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated 9 months ago
- A small library of LLM judges☆185Updated last week
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.☆108Updated 7 months ago
- Resources relating to the DLAI event: https://www.youtube.com/watch?v=eTieetk2dSw☆184Updated last year
- Benchmarking library for RAG☆193Updated this week
- Complete implementation of Llama2 with/without KV cache & inference 🚀☆47Updated 11 months ago
- Domain Adapted Language Modeling Toolkit - E2E RAG☆320Updated 5 months ago
- ☆195Updated last year