vectara/hallucination-leaderboard

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/vectara/hallucination-leaderboard)

vectara / hallucination-leaderboard

Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents

☆3,288

Alternatives and similar repositories for hallucination-leaderboard

Users that are interested in hallucination-leaderboard are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

EdinburghNLP / awesome-hallucination-detection
View on GitHub
List of papers on hallucination detection in LLMs.
☆1,120Jun 6, 2026Updated last month
huggingface / alignment-handbook
View on GitHub
Robust recipes to align language models with human and AI preferences
☆5,639May 26, 2026Updated last month
stanfordnlp / dspy
View on GitHub
DSPy: The framework for programming—not prompting—language models
☆36,293Updated this week
arcee-ai / mergekit
View on GitHub
Tools for merging pretrained large language models.
☆7,250Jun 17, 2026Updated last month
openai / simple-evals
View on GitHub
☆4,576Apr 22, 2026Updated 2 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
RUCAIBox / HaluEval
View on GitHub
This is the repository of HaluEval, a large-scale hallucination evaluation benchmark for Large Language Models.
☆592Feb 12, 2024Updated 2 years ago
run-llama / llama_index
View on GitHub
LlamaIndex is the leading document agent and OCR platform
☆50,962Updated this week
guidance-ai / guidance
View on GitHub
A guidance language for controlling large language models.
☆21,688May 21, 2026Updated 2 months ago
EleutherAI / lm-evaluation-harness
View on GitHub
A framework for few-shot evaluation of language models.
☆13,359Jul 13, 2026Updated last week
letta-ai / letta
View on GitHub
Platform for stateful agents: AI with advanced memory that can learn and self-improve over time.
☆23,903Jul 3, 2026Updated 2 weeks ago
vibrantlabsai / ragas
View on GitHub
Supercharge Your LLM Application Evaluations 🚀
☆14,935Feb 24, 2026Updated 4 months ago
BerriAI / litellm
View on GitHub
The fastest, litest AI Gateway. Rust core with Python SDK. Call 100+ LLM APIs in OpenAI (or native) format with cost tracking, guardrails…
☆54,241Updated this week
microsoft / autogen
View on GitHub
A programming framework for agentic AI
☆59,873Apr 15, 2026Updated 3 months ago
lm-sys / FastChat
View on GitHub
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
☆39,496May 1, 2026Updated 2 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
axolotl-ai-cloud / axolotl
View on GitHub
Go ahead and axolotl questions
☆12,222Updated this week
gkamradt / needle-in-a-haystack
View on GitHub
Doing simple retrieval from LLM models at various context lengths to measure accuracy
☆2,347Jun 8, 2026Updated last month
vllm-project / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆86,804Updated this week
dottxt-ai / outlines
View on GitHub
Structured Outputs
☆14,833Updated this week
unslothai / unsloth
View on GitHub
Unsloth is a local UI for training and running Gemma 4, Qwen3.6, DeepSeek, Kimi, GLM and other models.
☆68,666Updated this week
HillZhang1999 / llm-hallucination-survey
View on GitHub
Reading list of hallucination in LLMs. Check out our new survey paper: "Siren’s Song in the AI Ocean: A Survey on Hallucination in Large …
☆1,085Sep 27, 2025Updated 9 months ago
confident-ai / deepeval
View on GitHub
The LLM Evaluation Framework
☆17,006Updated this week
mit-han-lab / streaming-llm
View on GitHub
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
☆7,248Jul 11, 2024Updated 2 years ago
sgl-project / sglang
View on GitHub
SGLang is a high-performance serving framework for large language models and multimodal models.
☆30,583Updated this week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
openai / evals
View on GitHub
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
☆18,962Apr 14, 2026Updated 3 months ago
microsoftarchive / promptbench
View on GitHub
A unified evaluation framework for large language models
☆2,815Feb 20, 2026Updated 5 months ago
langchain-ai / langchain
View on GitHub
The agent engineering platform.
☆142,257Updated this week
huggingface / trl
View on GitHub
Train transformer language models with reinforcement learning.
☆18,898Updated this week
meta-llama / llama-cookbook
View on GitHub
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We als…
☆18,481May 19, 2026Updated 2 months ago
haotian-liu / LLaVA
View on GitHub
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
☆24,932Aug 12, 2024Updated last year
huggingface / peft
View on GitHub
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
☆21,426Updated this week
deepspeedai / DeepSpeed
View on GitHub
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
☆42,754Updated this week
langchain-ai / opengpts
View on GitHub
☆6,739Jun 26, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
huggingface / text-generation-inference
View on GitHub
Large Language Model Text Generation Inference
☆10,878Mar 21, 2026Updated 4 months ago
mem0ai / mem0
View on GitHub
Universal memory layer for AI Agents
☆61,383Updated this week
mistralai / mistral-inference
View on GitHub
Official inference library for Mistral models
☆10,830Jun 16, 2026Updated last month
microsoft / LLMLingua
View on GitHub
[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…
☆6,459Apr 8, 2026Updated 3 months ago
vectara / getting-started
View on GitHub
Examples of how to use the Vectara platform in several common programming languages
☆52Aug 23, 2024Updated last year
ShishirPatil / gorilla
View on GitHub
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
☆12,955Apr 13, 2026Updated 3 months ago
567-labs / instructor
View on GitHub
structured outputs for llms
☆13,593Jul 13, 2026Updated last week