alopatenko / LLMEvaluationLinks
A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessment, and critically assess the effectiveness of these evaluation methods.
☆155Updated this week
Alternatives and similar repositories for LLMEvaluation
Users that are interested in LLMEvaluation are comparing it to the libraries listed below
Sorting:
- Sample notebooks and prompts for LLM evaluation☆156Updated last month
- awesome synthetic (text) datasets☆310Updated 2 weeks ago
- ☆146Updated last year
- A small library of LLM judges☆301Updated 4 months ago
- Starter pack for NeurIPS LLM Efficiency Challenge 2023.☆129Updated 2 years ago
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆114Updated last year
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆179Updated last year
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆124Updated last month
- Attribute (or cite) statements generated by LLMs back to in-context information.☆302Updated last year
- ☆43Updated last year
- A set of scripts and notebooks on LLM finetunning and dataset creation☆111Updated last year
- Codebase accompanying the Summary of a Haystack paper.☆79Updated last year
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.☆116Updated 4 months ago
- ☆228Updated last year
- This is the reproduction repository for my 🤗 Hugging Face blog post on synthetic data☆68Updated last year
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da☆117Updated 8 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆51Updated last year
- LangFair is a Python library for conducting use-case level LLM bias and fairness assessments☆242Updated last week
- [ACL'25] Official Code for LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs☆314Updated 4 months ago
- TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycle☆299Updated last month
- Generalist and Lightweight Model for Text Classification☆165Updated last week
- Knowledge Graph Retrieval Augmented Generation (KG-RAG) Eval Datasets☆191Updated last year
- ☆80Updated last year
- Simple UI for debugging correlations of text embeddings☆302Updated 6 months ago
- wandbot is a technical support bot for Weights & Biases' AI developer tools that can run in Discord, Slack, ChatGPT and Zendesk☆309Updated last month
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻🍳☆343Updated 6 months ago
- Collection of links, tutorials and best practices of how to collect the data and build end-to-end RLHF system to finetune Generative AI m…☆224Updated 2 years ago
- Mistral + Haystack: build RAG pipelines that rock 🤘☆106Updated last year
- Benchmarking library for RAG☆248Updated last month
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆443Updated last year