PAIR-code / llm-comparatorLinks
LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR team.
β481Updated 7 months ago
Alternatives and similar repositories for llm-comparator
Users that are interested in llm-comparator are comparing it to the libraries listed below
Sorting:
- Evaluate your LLM's response with Prometheus and GPT4 π―β989Updated 4 months ago
- Automatically evaluate your LLMs in Google Colabβ659Updated last year
- TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycleβ296Updated 2 weeks ago
- Tutorial for building LLM routerβ228Updated last year
- Automated Evaluation of RAG Systemsβ656Updated 5 months ago
- Banishing LLM Hallucinations Requires Rethinking Generalizationβ276Updated last year
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various useβ¦β140Updated last month
- A Lightweight Library for AI Observabilityβ251Updated 7 months ago
- awesome synthetic (text) datasetsβ297Updated 2 months ago
- π€ Benchmark Large Language Models Reliably On Your Dataβ392Updated 2 weeks ago
- A small library of LLM judgesβ282Updated last month
- MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents [EMNLP 2024]β183Updated 3 weeks ago
- Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".β641Updated last month
- [ACL'25] Official Code for LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMsβ314Updated 2 months ago
- β231Updated 2 months ago
- Ranking LLMs on agentic tasksβ184Updated last week
- Attribute (or cite) statements generated by LLMs back to in-context information.β282Updated 11 months ago
- A large-scale information-rich web dataset, featuring millions of real clicked query-document labelsβ342Updated 9 months ago
- Build datasets using natural languageβ528Updated 4 months ago
- π©π»βπ³ A collection of example notebooks using Haystackβ501Updated 2 weeks ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on taskβ¦β178Updated last year
- End-to-end Generative Optimization for AI Agentsβ645Updated last month
- Simple UI for debugging correlations of text embeddingsβ291Updated 3 months ago
- wandbot is a technical support bot for Weights & Biases' AI developer tools that can run in Discord, Slack, ChatGPT and Zendeskβ307Updated last month
- Code for explaining and evaluating late chunking (chunked pooling)β452Updated 8 months ago
- β207Updated 9 months ago
- A tool for evaluating LLMsβ423Updated last year
- β145Updated last year
- DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. β π€π€β1,059Updated 7 months ago
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo rankerβ117Updated this week