PAIR-code / llm-comparatorLinks
LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR team.
β464Updated 5 months ago
Alternatives and similar repositories for llm-comparator
Users that are interested in llm-comparator are comparing it to the libraries listed below
Sorting:
- Evaluate your LLM's response with Prometheus and GPT4 π―β978Updated 3 months ago
- Automatically evaluate your LLMs in Google Colabβ649Updated last year
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various useβ¦β130Updated last week
- π€ Benchmark Large Language Models Reliably On Your Dataβ381Updated this week
- [ACL'25] Official Code for LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMsβ313Updated 3 weeks ago
- A tool for evaluating LLMsβ424Updated last year
- Automated Evaluation of RAG Systemsβ637Updated 4 months ago
- A Lightweight Library for AI Observabilityβ250Updated 5 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.β268Updated 10 months ago
- β145Updated last year
- TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycleβ289Updated last week
- β222Updated last month
- An open-source tool for general prompt optimization.β590Updated last week
- awesome synthetic (text) datasetsβ291Updated last month
- Banishing LLM Hallucinations Requires Rethinking Generalizationβ276Updated last year
- Tutorial for building LLM routerβ221Updated last year
- A small library of LLM judgesβ248Updated last week
- A library for prompt engineering and optimization (SAMMO = Structure-aware Multi-Objective Metaprompt Optimization)β716Updated last month
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.β434Updated last year
- Framework for enhancing LLMs for RAG tasks using fine-tuning.β747Updated 2 months ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on taskβ¦β173Updated 10 months ago
- An Awesome list of curated DSPy resources.β395Updated 5 months ago
- Ranking LLMs on agentic tasksβ176Updated 3 weeks ago
- Build datasets using natural languageβ507Updated 2 months ago
- Sample notebooks and prompts for LLM evaluationβ138Updated 2 months ago
- Easily embed, cluster and semantically label text datasetsβ560Updated last year
- Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".β629Updated this week
- β187Updated 8 months ago
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paperβ¦β108Updated last year
- Repository to demonstrate Chain of Table reasoning with multiple tables powered by LangGraphβ147Updated last year