alopatenko / LLMEvaluationLinks
A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessment, and critically assess the effectiveness of these evaluation methods.
☆129Updated 3 weeks ago
Alternatives and similar repositories for LLMEvaluation
Users that are interested in LLMEvaluation are comparing it to the libraries listed below
Sorting:
- Sample notebooks and prompts for LLM evaluation☆138Updated last month
- A set of scripts and notebooks on LLM finetunning and dataset creation☆110Updated 10 months ago
- Starter pack for NeurIPS LLM Efficiency Challenge 2023.☆125Updated last year
- [ACL'25] Official Code for LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs☆313Updated 3 weeks ago
- awesome synthetic (text) datasets☆291Updated 3 weeks ago
- ☆145Updated last year
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆173Updated 10 months ago
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da☆111Updated 4 months ago
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆108Updated last year
- Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.☆222Updated 3 weeks ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated last year
- A small library of LLM judges☆248Updated this week
- This is the reproduction repository for my 🤗 Hugging Face blog post on synthetic data☆68Updated last year
- LangFair is a Python library for conducting use-case level LLM bias and fairness assessments☆222Updated last week
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻🍳☆317Updated 2 months ago
- ☆77Updated last year
- Automatically evaluate your LLMs in Google Colab☆649Updated last year
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆114Updated 3 weeks ago
- Notes from the Latent Space paper club. Follow along or start your own!☆235Updated last year
- wandbot is a technical support bot for Weights & Biases' AI developer tools that can run in Discord, Slack, ChatGPT and Zendesk☆302Updated this week
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.☆113Updated last week
- Domain Adapted Language Modeling Toolkit - E2E RAG☆325Updated 8 months ago
- Simple UI for debugging correlations of text embeddings☆288Updated 2 months ago
- Codebase accompanying the Summary of a Haystack paper.☆79Updated 10 months ago
- Fine-tune an LLM to perform batch inference and online serving.☆112Updated 2 months ago
- Mistral + Haystack: build RAG pipelines that rock 🤘☆105Updated last year
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆434Updated last year
- Attribute (or cite) statements generated by LLMs back to in-context information.☆261Updated 9 months ago
- LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR t…☆464Updated 5 months ago
- Building a chatbot powered with a RAG pipeline to read,summarize and quote the most relevant papers related to the user query.☆168Updated last year