RUCAIBox / HaluEvalView external linksLinks
This is the repository of HaluEval, a large-scale hallucination evaluation benchmark for Large Language Models.
☆552Feb 12, 2024Updated 2 years ago
Alternatives and similar repositories for HaluEval
Users that are interested in HaluEval are comparing it to the libraries listed below
Sorting:
- ☆48Jan 7, 2024Updated 2 years ago
- Reading list of hallucination in LLMs. Check out our new survey paper: "Siren’s Song in the AI Ocean: A Survey on Hallucination in Large …☆1,076Sep 27, 2025Updated 4 months ago
- List of papers on hallucination detection in LLMs.☆1,041Jan 11, 2026Updated last month
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic…☆415Apr 13, 2025Updated 10 months ago
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models☆601Jun 26, 2024Updated last year
- Token-level Reference-free Hallucination Detection☆98Jul 25, 2023Updated 2 years ago
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆63Dec 25, 2023Updated 2 years ago
- Code and data for the FACTOR paper☆54Nov 15, 2023Updated 2 years ago
- TruthfulQA: Measuring How Models Imitate Human Falsehoods☆880Jan 16, 2025Updated last year
- Inference-Time Intervention: Eliciting Truthful Answers from a Language Model☆570Jan 28, 2025Updated last year
- ☆89Nov 11, 2022Updated 3 years ago
- Implementation of "Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation"☆82Jul 31, 2023Updated 2 years ago
- FacTool: Factuality Detection in Generative AI☆912Aug 19, 2024Updated last year
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆62Mar 30, 2024Updated last year
- Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"☆537Jan 17, 2025Updated last year
- ☆43Sep 3, 2024Updated last year
- Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"☆136Jun 5, 2024Updated last year
- Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper☆309May 1, 2025Updated 9 months ago
- Do Large Language Models Know What They Don’t Know?☆102Nov 8, 2024Updated last year
- ☆282Jan 6, 2025Updated last year
- Paper List for In-context Learning 🌷☆875Oct 8, 2024Updated last year
- LLM hallucination paper list☆331Mar 11, 2024Updated last year
- ☆17Dec 21, 2023Updated 2 years ago
- [EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627☆511Oct 9, 2024Updated last year
- Generative Judge for Evaluating Alignment☆250Jan 18, 2024Updated 2 years ago
- ☆58Jun 30, 2023Updated 2 years ago
- Collection of papers for scalable automated alignment.☆93Oct 22, 2024Updated last year
- Measuring Massive Multitask Language Understanding | ICLR 2021☆1,550May 28, 2023Updated 2 years ago
- A Bilingual Role Evaluation Benchmark for Large Language Models☆43Jan 9, 2024Updated 2 years ago
- This is a collection of research papers for Self-Correcting Large Language Models with Automated Feedback.☆567Oct 28, 2024Updated last year
- This is the code for the paper "Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation".☆37Sep 1, 2025Updated 5 months ago
- ACL2023 - AlignScore, a metric for factual consistency evaluation.☆150Mar 11, 2024Updated last year
- Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"☆69Feb 27, 2024Updated last year
- A trend starts from "Chain of Thought Prompting Elicits Reasoning in Large Language Models".☆2,101Oct 5, 2023Updated 2 years ago
- Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them☆548Jun 25, 2024Updated last year
- ☆16Sep 27, 2023Updated 2 years ago
- The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".☆1,594Jun 3, 2025Updated 8 months ago
- ☆22Feb 3, 2024Updated 2 years ago
- ☆21Aug 19, 2024Updated last year