alopatenko / LLMEvaluation
A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessment, and critically assess the effectiveness of these evaluation methods.
☆92Updated this week
Alternatives and similar repositories for LLMEvaluation:
Users that are interested in LLMEvaluation are comparing it to the libraries listed below
- Sample notebooks and prompts for LLM evaluation☆120Updated 2 months ago
- This project showcases an LLMOps pipeline that fine-tunes a small-size LLM model to prepare for the outage of the service LLM.☆295Updated 2 months ago
- ☆139Updated 6 months ago
- awesome synthetic (text) datasets☆259Updated 3 months ago
- ☆147Updated 2 months ago
- A set of scripts and notebooks on LLM finetunning and dataset creation☆102Updated 4 months ago
- ☆77Updated 8 months ago
- ☆76Updated 8 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆197Updated 4 months ago
- Domain Adapted Language Modeling Toolkit - E2E RAG☆313Updated 3 months ago
- Starter pack for NeurIPS LLM Efficiency Challenge 2023.☆124Updated last year
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da☆93Updated last month
- This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation.…☆230Updated 4 months ago
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻🍳☆253Updated last month
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.☆105Updated 5 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 7 months ago
- ☆165Updated 8 months ago
- Late Interaction Models Training & Retrieval☆236Updated last week
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆101Updated 10 months ago
- ☆194Updated 9 months ago
- Check for data drift between two OpenAI multi-turn chat jsonl files.☆37Updated 10 months ago
- Let's build better datasets, together!☆252Updated last month
- End-to-End LLM Guide☆101Updated 7 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆100Updated 10 months ago
- Tutorial for building LLM router☆179Updated 6 months ago
- This is the reproduction repository for my 🤗 Hugging Face blog post on synthetic data☆63Updated 11 months ago
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆106Updated this week
- Mistral + Haystack: build RAG pipelines that rock 🤘☆100Updated last year
- Codebase accompanying the Summary of a Haystack paper.☆74Updated 4 months ago
- Complete implementation of Llama2 with/without KV cache & inference 🚀☆47Updated 8 months ago