vectara / hallucination-leaderboardLinks
Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents
☆2,829Updated this week
Alternatives and similar repositories for hallucination-leaderboard
Users that are interested in hallucination-leaderboard are comparing it to the libraries listed below
Sorting:
- A principled instruction benchmark on formulating effective queries and prompts for large language models (LLMs). Our paper: https://arxi…☆976Updated last year
- LiveBench: A Challenging, Contamination-Free LLM Benchmark☆928Updated last week
- WikiChat is an improved RAG. It stops the hallucination of large language models by retrieving data from a corpus.☆1,533Updated 6 months ago
- ☆4,181Updated 3 months ago
- A framework for prompt tuning using Intent-based Prompt Calibration☆2,834Updated 7 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,932Updated this week
- A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.☆802Updated 4 months ago
- A unified evaluation framework for large language models☆2,743Updated last month
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆5,585Updated 3 weeks ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆2,074Updated last year
- SWE-bench: Can Language Models Resolve Real-world Github Issues?☆3,825Updated last week
- Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.☆1,121Updated last month
- Agentless🐱: an agentless approach to automatically solve software development problems☆1,956Updated 11 months ago
- [ICLR 2025] LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs☆1,785Updated 4 months ago
- A python wrapper for Tavily search API☆900Updated last week
- g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains☆4,223Updated 2 months ago
- Evaluate your LLM's response with Prometheus and GPT4 💯☆1,013Updated 6 months ago
- Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"☆1,398Updated 9 months ago
- MTEB: Massive Text Embedding Benchmark☆2,977Updated this week
- Using Tree-of-Thought Prompting to boost ChatGPT's reasoning☆805Updated last year
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard a…☆1,950Updated last month
- Optimizing inference proxy for LLMs☆3,157Updated this week
- Enforce the output format (JSON Schema, Regex etc) of a language model☆1,952Updated 2 months ago
- Tools for merging pretrained large language models.☆6,468Updated 3 weeks ago
- TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients. Published in Nature.☆3,097Updated 3 months ago
- [ICLR 2025] Automated Design of Agentic Systems☆1,460Updated 9 months ago
- ☆1,324Updated last year
- An self-improving embodied conversational agent seamlessly integrated into the operating system to automate our daily tasks.☆1,695Updated last year
- Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models☆2,837Updated 10 months ago
- Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""☆3,901Updated 11 months ago