vectara / hallucination-leaderboardLinks
Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents
☆2,995Updated this week
Alternatives and similar repositories for hallucination-leaderboard
Users that are interested in hallucination-leaderboard are comparing it to the libraries listed below
Sorting:
- WikiChat is an improved RAG. It stops the hallucination of large language models by retrieving data from a corpus.☆1,538Updated 8 months ago
- ☆4,262Updated 5 months ago
- A unified evaluation framework for large language models☆2,771Updated 2 months ago
- A principled instruction benchmark on formulating effective queries and prompts for large language models (LLMs). Our paper: https://arxi…☆979Updated last year
- NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.☆5,459Updated last week
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆5,736Updated 2 months ago
- Enforce the output format (JSON Schema, Regex etc) of a language model☆1,976Updated 4 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆2,137Updated last year
- An open-source visual programming environment for battle-testing prompts to LLMs.☆2,907Updated last week
- Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.☆1,130Updated 2 months ago
- Tools for merging pretrained large language models.☆6,673Updated last week
- LiveBench: A Challenging, Contamination-Free LLM Benchmark☆998Updated last week
- MTEB: Massive Text Embedding Benchmark☆3,066Updated this week
- SWE-bench: Can Language Models Resolve Real-world Github Issues?☆4,074Updated this week
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality☆4,502Updated last year
- Superfast AI decision making and intelligent processing of multi-modal data.☆3,161Updated last month
- Agentless🐱: an agentless approach to automatically solve software development problems☆1,994Updated last year
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆3,015Updated 2 weeks ago
- TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients. Published in Nature.☆3,266Updated 5 months ago
- Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhan…☆1,521Updated last year
- A framework for prompt tuning using Intent-based Prompt Calibration☆2,905Updated last month
- Optimizing inference proxy for LLMs☆3,266Updated 2 weeks ago
- Code and Data for Tau-Bench☆1,048Updated 4 months ago
- [ICLR 2025] Automated Design of Agentic Systems☆1,485Updated 11 months ago
- Open-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chro…☆2,991Updated last year
- Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models …☆2,612Updated this week
- All things prompt engineering☆5,723Updated last year
- Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization☆1,382Updated last year
- Synthetic data curation for post-training and structured data extraction☆1,595Updated this week
- ☆2,529Updated this week