alopatenko / LLMEvaluation
A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessment, and critically assess the effectiveness of these evaluation methods.
☆100Updated this week
Alternatives and similar repositories for LLMEvaluation:
Users that are interested in LLMEvaluation are comparing it to the libraries listed below
- ☆142Updated 8 months ago
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.☆107Updated 6 months ago
- Sample notebooks and prompts for LLM evaluation☆123Updated 3 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆99Updated 11 months ago
- awesome synthetic (text) datasets☆265Updated 4 months ago
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da☆101Updated last week
- This is the reproduction repository for my 🤗 Hugging Face blog post on synthetic data☆68Updated last year
- Starter pack for NeurIPS LLM Efficiency Challenge 2023.☆124Updated last year
- ☆150Updated 3 months ago
- ☆77Updated 9 months ago
- ☆76Updated 9 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated 8 months ago
- A set of scripts and notebooks on LLM finetunning and dataset creation☆105Updated 5 months ago
- Fiddler Auditor is a tool to evaluate language models.☆177Updated last year
- Codebase accompanying the Summary of a Haystack paper.☆75Updated 6 months ago
- Building a chatbot powered with a RAG pipeline to read,summarize and quote the most relevant papers related to the user query.☆166Updated 11 months ago
- Mistral + Haystack: build RAG pipelines that rock 🤘☆103Updated last year
- Just a bunch of benchmark logs for different LLMs☆119Updated 7 months ago
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆102Updated 11 months ago
- Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.☆183Updated last week
- Domain Adapted Language Modeling Toolkit - E2E RAG☆316Updated 4 months ago
- experiments with inference on llama☆104Updated 9 months ago
- End-to-End LLM Guide☆104Updated 8 months ago
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆73Updated 5 months ago
- This repository contains the code for dataset curation and finetuning of instruct variant of the Bilingual OpenHathi model. The resultin…☆23Updated last year
- ☆51Updated 9 months ago
- Chunk your text using gpt4o-mini more accurately☆44Updated 7 months ago
- LangFair is a Python library for conducting use-case level LLM bias and fairness assessments☆184Updated 2 weeks ago
- This project showcases an LLMOps pipeline that fine-tunes a small-size LLM model to prepare for the outage of the service LLM.☆302Updated last month
- 📝 Reference-Free automatic summarization evaluation with potential hallucination detection☆99Updated last year