PAIR-code / llm-comparatorLinks
LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR team.
β444Updated 4 months ago
Alternatives and similar repositories for llm-comparator
Users that are interested in llm-comparator are comparing it to the libraries listed below
Sorting:
- Evaluate your LLM's response with Prometheus and GPT4 π―β952Updated 2 months ago
- awesome synthetic (text) datasetsβ282Updated 7 months ago
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various useβ¦β122Updated last month
- β668Updated last month
- β520Updated 7 months ago
- A small library of LLM judgesβ216Updated last week
- Automatically evaluate your LLMs in Google Colabβ643Updated last year
- Easily embed, cluster and semantically label text datasetsβ552Updated last year
- Generative Representational Instruction Tuningβ654Updated 3 months ago
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Modelsβ537Updated last year
- A Lightweight Library for AI Observabilityβ246Updated 4 months ago
- π€ Benchmark Large Language Models Reliably On Your Dataβ332Updated this week
- Automated Evaluation of RAG Systemsβ613Updated 2 months ago
- Automatic evals for LLMsβ437Updated 2 weeks ago
- Code for Husky, an open-source language agent that solves complex, multi-step reasoning tasks. Husky v1 addresses numerical, tabular and β¦β345Updated last year
- Fine-Tuning Embedding for RAG with Synthetic Dataβ501Updated last year
- Let's build better datasets, together!β259Updated 6 months ago
- Build datasets using natural languageβ493Updated last month
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on taskβ¦β173Updated 9 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.β242Updated 8 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ1,641Updated this week
- SUQL: Conversational Search over Structured and Unstructured Data with LLMsβ270Updated 3 weeks ago
- Fast Semantic Text Deduplication & Filteringβ738Updated 3 weeks ago
- RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.β472Updated this week
- Sample notebooks and prompts for LLM evaluationβ135Updated 2 weeks ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.β1,459Updated 3 weeks ago
- wandbot is a technical support bot for Weights & Biases' AI developer tools that can run in Discord, Slack, ChatGPT and Zendeskβ301Updated this week
- π LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). π Extracts signals from prompts & responses, ensuring saβ¦β921Updated 7 months ago
- A tool for evaluating LLMsβ419Updated last year
- Framework for enhancing LLMs for RAG tasks using fine-tuning.β741Updated last month