OpenMOSS / HalluQALinks

Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"

☆135

Alternatives and similar repositories for HalluQA

Users that are interested in HalluQA are comparing it to the libraries listed below

Sorting:

BAAI-Zlab / COIG
☆128Updated 2 years ago
IronBeliever / CaR
Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation
☆89Updated 11 months ago
MikeGu721 / XiezhiBenchmark
☆97Updated last year
llmeval / LLMEval-1
中文大语言模型评测第一期
☆110Updated last year
CASIA-LM / MoDS
☆145Updated last year
CASIA-LM / ChineseWebText
☆179Updated last year
thu-coai / ComplexBench
Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)
☆95Updated 8 months ago
thu-coai / CritiqueLLM
☆147Updated last year
OFA-Sys / InsTag
InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning
☆276Updated 2 years ago
llmeval / LLMEval-2
中文大语言模型评测第二期
☆71Updated last year
LaVi-Lab / CLEVA
[EMNLP 2023 Demo] "CLEVA: Chinese Language Models EVAluation Platform"
☆62Updated 5 months ago
tianyi-lab / Cherry_LLM
[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…
☆397Updated 3 months ago
tjunlp-lab / M3KE
A Massive Multi-Level Multi-Subject Knowledge Evaluation benchmark
☆102Updated 2 years ago
sufengniu / RefGPT
☆163Updated 2 years ago
YJiangcm / FollowBench
[ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models
☆114Updated 4 months ago
Felixgithub2017 / MMCU
MEASURING MASSIVE MULTITASK CHINESE UNDERSTANDING
☆89Updated last year
Abbey4799 / CELLO
Code and data for the paper "Can Large Language Models Understand Real-World Complex Instructions?"(AAAI2024)
☆49Updated last year
Spico197 / Humpback
🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.
☆139Updated 5 months ago
pldlgb / nuggets
☆83Updated last year
zexuanqiu / CLongEval
CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models
☆44Updated last year
mutonix / RefGPT
☆98Updated last year
CLUEbenchmark / SuperCLUE-Math6
SuperCLUE-Math6：新一代中文原生多轮多步数学推理数据集的探索之旅
☆60Updated last year
OpenNLG / OpenBA
☆96Updated 2 years ago
fanqiwan / KCA
EMNLP'2024: Knowledge Verification to Nip Hallucination in the Bud
☆21Updated last year
tianyi-lab / Superfiltering
[ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
☆178Updated 3 months ago
xv44586 / Chinese-instruction-datasets
中文 Instruction tuning datasets
☆137Updated last year
OpenLMLab / ChatZoo
Light local website for displaying performances from different chat models.
☆87Updated last year
YJiangcm / Lion
[EMNLP 2023] Lion: Adversarial Distillation of Proprietary Large Language Models
☆211Updated last year
PKU-Baichuan-MLSystemLab / CFBench
CFBench: A Comprehensive Constraints-Following Benchmark for LLMs
☆43Updated last year
csitfun / LogiQA2.0
Logiqa2.0 dataset - logical reasoning in MRC and NLI tasks
☆99Updated 2 years ago