alopatenko / LLMEvaluationLinks
A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessment, and critically assess the effectiveness of these evaluation methods.
☆123Updated this week
Alternatives and similar repositories for LLMEvaluation
Users that are interested in LLMEvaluation are comparing it to the libraries listed below
Sorting:
- Sample notebooks and prompts for LLM evaluation☆135Updated last month
- awesome synthetic (text) datasets☆286Updated last week
- A set of scripts and notebooks on LLM finetunning and dataset creation☆110Updated 9 months ago
- Starter pack for NeurIPS LLM Efficiency Challenge 2023.☆125Updated last year
- Notes from the Latent Space paper club. Follow along or start your own!☆234Updated 11 months ago
- This project showcases an LLMOps pipeline that fine-tunes a small-size LLM model to prepare for the outage of the service LLM.☆307Updated 3 months ago
- A small library of LLM judges☆228Updated 2 weeks ago
- LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR t…☆454Updated 5 months ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆173Updated 9 months ago
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆107Updated last year
- Banishing LLM Hallucinations Requires Rethinking Generalization☆276Updated 11 months ago
- ☆144Updated 11 months ago
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆431Updated last year
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻🍳☆315Updated last month
- Simple UI for debugging correlations of text embeddings☆287Updated last month
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.☆111Updated 10 months ago
- ☆154Updated 7 months ago
- Easily embed, cluster and semantically label text datasets☆552Updated last year
- This is the reproduction repository for my 🤗 Hugging Face blog post on synthetic data☆68Updated last year
- Automatically evaluate your LLMs in Google Colab☆649Updated last year
- wandbot is a technical support bot for Weights & Biases' AI developer tools that can run in Discord, Slack, ChatGPT and Zendesk☆302Updated last week
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da☆105Updated 3 months ago
- ☆204Updated last year
- ☆195Updated last year
- ☆78Updated last year
- This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation.…☆346Updated 4 months ago
- Domain Adapted Language Modeling Toolkit - E2E RAG☆323Updated 8 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆245Updated 9 months ago
- LangFair is a Python library for conducting use-case level LLM bias and fairness assessments☆218Updated this week
- ☆185Updated last year