PAIR-code / llm-comparator
LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR team.
☆322Updated last month
Related projects ⓘ
Alternatives and complementary repositories for llm-comparator
- awesome synthetic (text) datasets☆242Updated 3 weeks ago
- Automatically evaluate your LLMs in Google Colab☆559Updated 6 months ago
- Evaluate your LLM's response with Prometheus and GPT4 💯☆797Updated 2 months ago
- ☆451Updated 3 weeks ago
- ☆127Updated 3 months ago
- Let's build better datasets, together!☆205Updated this week
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆811Updated this week
- A tool for evaluating LLMs☆392Updated 6 months ago
- Tutorial for building LLM router☆163Updated 4 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆1,634Updated this week
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.☆100Updated 2 months ago
- Automated Evaluation of RAG Systems☆484Updated 2 weeks ago
- Banishing LLM Hallucinations Requires Rethinking Generalization☆261Updated 4 months ago
- Code for Husky, an open-source language agent that solves complex, multi-step reasoning tasks. Husky v1 addresses numerical, tabular and …☆328Updated 5 months ago
- Building a chatbot powered with a RAG pipeline to read,summarize and quote the most relevant papers related to the user query.☆162Updated 6 months ago
- ☆214Updated last week
- AWM: Agent Workflow Memory☆205Updated last month
- ☆131Updated 4 months ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,095Updated last week
- Code for explaining and evaluating late chunking (chunked pooling)☆246Updated last month
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆385Updated 9 months ago
- Official repo for "LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs".☆192Updated 2 months ago
- Code and Data for Tau-Bench☆201Updated 3 weeks ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆236Updated 4 months ago
- Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"☆448Updated 8 months ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆165Updated 2 weeks ago
- [ICLR 2024 & NeurIPS 2023 WS] An Evaluator LM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically d…☆287Updated last year
- The official evaluation suite and dynamic data release for MixEval.☆224Updated last week
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆106Updated 3 weeks ago