alopatenko / LLMEvaluationLinks
A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessment, and critically assess the effectiveness of these evaluation methods.
☆121Updated last month
Alternatives and similar repositories for LLMEvaluation
Users that are interested in LLMEvaluation are comparing it to the libraries listed below
Sorting:
- Sample notebooks and prompts for LLM evaluation☆135Updated 2 weeks ago
- awesome synthetic (text) datasets☆282Updated 7 months ago
- A set of scripts and notebooks on LLM finetunning and dataset creation☆111Updated 8 months ago
- ☆144Updated 11 months ago
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆107Updated last year
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻🍳☆313Updated 3 weeks ago
- Benchmarking library for RAG☆209Updated last week
- ☆195Updated last year
- Attribute (or cite) statements generated by LLMs back to in-context information.☆240Updated 8 months ago
- ☆77Updated last year
- Starter pack for NeurIPS LLM Efficiency Challenge 2023.☆124Updated last year
- wandbot is a technical support bot for Weights & Biases' AI developer tools that can run in Discord, Slack, ChatGPT and Zendesk☆300Updated 2 weeks ago
- A small library of LLM judges☆216Updated this week
- 🤗 Benchmark Large Language Models Reliably On Your Data☆329Updated this week
- Late Interaction Models Training & Retrieval☆444Updated 2 weeks ago
- This project showcases an LLMOps pipeline that fine-tunes a small-size LLM model to prepare for the outage of the service LLM.☆306Updated 2 months ago
- GenAIOps on Kubernetes: A collection of reference architectures for running GenAI at scale on Kubernetes using OSS tooling☆130Updated 7 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated 11 months ago
- ☆152Updated 6 months ago
- Codebase accompanying the Summary of a Haystack paper.☆78Updated 9 months ago
- This is the reproduction repository for my 🤗 Hugging Face blog post on synthetic data☆68Updated last year
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆99Updated last year
- Fine-tune an LLM to perform batch inference and online serving.☆112Updated 3 weeks ago
- Fiddler Auditor is a tool to evaluate language models.☆183Updated last year
- Generalist and Lightweight Model for Text Classification☆133Updated last week
- A Lightweight Library for AI Observability☆245Updated 4 months ago
- This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation.…☆331Updated 3 months ago
- ☆211Updated 11 months ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆173Updated 9 months ago
- Building a chatbot powered with a RAG pipeline to read,summarize and quote the most relevant papers related to the user query.☆167Updated last year