quotient-ai / judges
A small library of LLM judges
☆193Updated this week
Alternatives and similar repositories for judges
Users that are interested in judges are comparing it to the libraries listed below
Sorting:
- Verdict is a library for scaling judge-time compute.☆211Updated 2 weeks ago
- A Lightweight Library for AI Observability☆243Updated 2 months ago
- ☆143Updated 9 months ago
- A flexible, adaptive classification system for dynamic text classification☆165Updated last week
- Attribute (or cite) statements generated by LLMs back to in-context information.☆231Updated 7 months ago
- awesome synthetic (text) datasets☆281Updated 6 months ago
- ⚖️ Awesome LLM Judges ⚖️☆97Updated 2 weeks ago
- Codebase accompanying the Summary of a Haystack paper.☆78Updated 7 months ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆166Updated 7 months ago
- 📝 Reference-Free automatic summarization evaluation with potential hallucination detection☆100Updated last year
- ☆151Updated 5 months ago
- ☆38Updated 10 months ago
- Notebooks for training universal 0-shot classifiers on many different tasks☆126Updated 4 months ago
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆109Updated last month
- ☆120Updated last month
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆421Updated last year
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use…☆116Updated last week
- 🤗 Benchmark Large Language Models Reliably On Your Data☆295Updated this week
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆99Updated last year
- ☆169Updated 9 months ago
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆110Updated 8 months ago
- ☆210Updated 10 months ago
- MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents [EMNLP 2024]☆160Updated 4 months ago
- ☆117Updated 8 months ago
- XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.☆129Updated 2 weeks ago
- Functional Benchmarks and the Reasoning Gap☆86Updated 7 months ago
- Synthetic Data for LLM Fine-Tuning☆115Updated last year
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…☆235Updated 3 months ago
- ☆100Updated last month
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆217Updated 6 months ago