aiverify-foundation / LLM-Evals-Catalogue
This repository stems from our paper, “Cataloguing LLM Evaluations”, and serves as a living, collaborative catalogue of LLM evaluation frameworks, benchmarks and papers.
☆14Updated last year
Related projects ⓘ
Alternatives and complementary repositories for LLM-Evals-Catalogue
- Sample notebooks and prompts for LLM evaluation☆114Updated last week
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆96Updated 7 months ago
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆106Updated this week
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use…☆67Updated this week
- ☆133Updated 4 months ago
- Knowledge Graph Retrieval Augmented Generation (KG-RAG) Eval Datasets☆129Updated 7 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆122Updated 8 months ago
- Notebooks and articles related to LLMs☆24Updated 10 months ago
- ☆30Updated last year
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.☆100Updated 2 months ago
- TalkToModel gives anyone with the powers of XAI through natural language conversations 💬!☆113Updated last year
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 4 months ago
- An index of all of our weekly concepts + code events for aspiring AI Engineers and Business Leaders!!☆50Updated this week
- ☆112Updated last month
- ☆66Updated 6 months ago
- Low latency, High Accuracy, Custom Query routers for Co-pilots and Agents. Built by Prithivi Da☆52Updated this week
- ☆87Updated last year
- Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning☆41Updated 11 months ago
- ☆75Updated 5 months ago
- MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents [EMNLP 2024]☆104Updated last month
- ☆82Updated 3 months ago
- This is the reproduction repository for my 🤗 Hugging Face blog post on synthetic data☆61Updated 9 months ago
- ☆17Updated 7 months ago
- What, Why and How of LLMs.☆68Updated 9 months ago
- Codebase accompanying the Summary of a Haystack paper.☆72Updated 2 months ago
- Let's build better datasets, together!☆206Updated this week
- ☆106Updated 2 months ago
- Starter pack for NeurIPS LLM Efficiency Challenge 2023.☆118Updated last year
- A repository to perform self-instruct with a model on HF Hub☆31Updated last year
- Toolkit for attaching, training, saving and loading of new heads for transformer models☆246Updated 2 weeks ago