aiverify-foundation / LLM-Evals-CatalogueLinks
This repository stems from our paper, “Cataloguing LLM Evaluations”, and serves as a living, collaborative catalogue of LLM evaluation frameworks, benchmarks and papers.
☆18Updated last year
Alternatives and similar repositories for LLM-Evals-Catalogue
Users that are interested in LLM-Evals-Catalogue are comparing it to the libraries listed below
Sorting:
- ☆145Updated last year
- Sample notebooks and prompts for LLM evaluation☆138Updated 2 months ago
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use…☆138Updated 2 weeks ago
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.☆114Updated last month
- ☆20Updated last year
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆109Updated last year
- LangFair is a Python library for conducting use-case level LLM bias and fairness assessments☆231Updated this week
- This is the reproduction repository for my 🤗 Hugging Face blog post on synthetic data☆68Updated last year
- ☆72Updated 10 months ago
- 🦜💯 Flex those feathers!☆252Updated 10 months ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆176Updated 11 months ago
- DSPY on action with OpenSource LLMs.☆75Updated last year
- A Lightweight Library for AI Observability☆250Updated 6 months ago
- Fiddler Auditor is a tool to evaluate language models.☆187Updated last year
- ☆30Updated last year
- Repository to demonstrate Chain of Table reasoning with multiple tables powered by LangGraph☆147Updated last year
- ☆185Updated last year
- Simple examples using Argilla tools to build AI☆55Updated 9 months ago
- This repository implements the chain of verification paper by Meta AI☆176Updated last year
- Banishing LLM Hallucinations Requires Rethinking Generalization☆276Updated last year
- RAGArch is a Streamlit-based application that empowers users to experiment with various components and parameters of Retrieval-Augmented …☆85Updated last year
- A semantic research engine to get relevant papers based on a user query. Application frontend with Chainlit Copilot. Observability with L…☆83Updated last year
- A reimplementation of langgraph's customer support example in Rasa's CALM paradigm and a quantiative evaluation of the 2 approaches☆80Updated 5 months ago
- Tuning and Evaluation of RAG pipeline. (Automated optimization to be added soon)☆264Updated last year
- A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).☆38Updated last year
- A curated list of awesome synthetic data tools (open source and commercial).☆202Updated last year
- Python SDK for running evaluations on LLM generated responses☆291Updated 2 months ago
- Official Implementation of "Multi-Head RAG: Solving Multi-Aspect Problems with LLMs"☆227Updated 2 months ago
- ☆76Updated 6 months ago
- ☆89Updated 3 months ago