PAIR-code / llm-comparator
LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR team.
☆423Updated 3 months ago
Alternatives and similar repositories for llm-comparator
Users that are interested in llm-comparator are comparing it to the libraries listed below
Sorting:
- awesome synthetic (text) datasets☆281Updated 6 months ago
- Evaluate your LLM's response with Prometheus and GPT4 💯☆938Updated 3 weeks ago
- 🤗 Benchmark Large Language Models Reliably On Your Data☆295Updated this week
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use…☆116Updated last week
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models☆522Updated 10 months ago
- Automated Evaluation of RAG Systems☆590Updated last month
- ☆515Updated 5 months ago
- A Lightweight Library for AI Observability☆243Updated 2 months ago
- Let's build better datasets, together!☆259Updated 4 months ago
- Automatically evaluate your LLMs in Google Colab☆622Updated last year
- Attribute (or cite) statements generated by LLMs back to in-context information.☆229Updated 7 months ago
- wandbot is a technical support bot for Weights & Biases' AI developer tools that can run in Discord, Slack, ChatGPT and Zendesk☆299Updated 3 weeks ago
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆703Updated last week
- Code and Data for Tau-Bench☆485Updated 3 months ago
- Beating the GAIA benchmark with Transformers Agents. 🚀☆114Updated 2 months ago
- Banishing LLM Hallucinations Requires Rethinking Generalization☆274Updated 10 months ago
- Fine-Tuning Embedding for RAG with Synthetic Data☆497Updated last year
- Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.☆203Updated 3 weeks ago
- A small library of LLM judges☆193Updated this week
- Automatic evals for LLMs☆388Updated this week
- A large-scale information-rich web dataset, featuring millions of real clicked query-document labels☆326Updated 5 months ago
- Generative Representational Instruction Tuning☆628Updated 2 months ago
- TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycle☆267Updated last week
- ☆168Updated 5 months ago
- This project showcases an LLMOps pipeline that fine-tunes a small-size LLM model to prepare for the outage of the service LLM.☆304Updated last month
- Code for explaining and evaluating late chunking (chunked pooling)☆384Updated 4 months ago
- Domain Adapted Language Modeling Toolkit - E2E RAG☆320Updated 6 months ago
- ☆233Updated last month
- ☆143Updated 9 months ago
- RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.☆445Updated last week