xsysigma / TencentLLMEvalLinks

TencentLLMEval is a comprehensive and extensive benchmark for artificial evaluation of large models that includes task trees, standards, data verification methods, and more.

☆39

Alternatives and similar repositories for TencentLLMEval

Users that are interested in TencentLLMEval are comparing it to the libraries listed below

Sorting:

Felixgithub2017 / MMCU
MEASURING MASSIVE MULTITASK CHINESE UNDERSTANDING
☆89Updated last year
bojone / NBCE
Naive Bayes-based Context Extension
☆324Updated 10 months ago
FlagOpen / FlagInstruct
☆172Updated 2 years ago
FudanNLPLAB / CBook-150K
中文图书语料MD5链接
☆217Updated last year
sufengniu / RefGPT
☆163Updated 2 years ago
llmeval / LLMEval-1
中文大语言模型评测第一期
☆110Updated 2 years ago
YJiangcm / Lion
[EMNLP 2023] Lion: Adversarial Distillation of Proprietary Large Language Models
☆212Updated last year
lemon234071 / clean-dialog
A framework for cleaning Chinese dialog data
☆273Updated 4 years ago
xv44586 / Chinese-instruction-datasets
中文 Instruction tuning datasets
☆137Updated last year
FreedomIntelligence / InstructionZoo
☆281Updated last year
tjunlp-lab / M3KE
A Massive Multi-Level Multi-Subject Knowledge Evaluation benchmark
☆102Updated 2 years ago
IronBeliever / CaR
Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation
☆89Updated 11 months ago
BAAI-Zlab / COIG
☆128Updated 2 years ago
OpenMOSS / HalluQA
Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"
☆135Updated last year
thu-coai / OPD
OPD: Chinese Open-Domain Pre-trained Dialogue Model
☆75Updated 2 years ago
genggui001 / Megatron-DeepSpeed-Llama
☆84Updated 2 years ago
keezen / ntk_alibi
NTK scaled version of ALiBi position encoding in Transformer.
☆69Updated 2 years ago
XueFuzhao / InstructionWild
☆460Updated last year
mutonix / RefGPT
☆98Updated last year
MikeGu721 / XiezhiBenchmark
☆97Updated last year
thu-coai / EVA
EVA: Large-scale Pre-trained Chit-Chat Models
☆307Updated 2 years ago
THUIR / T2Ranking
T2Ranking: A large-scale Chinese benchmark for passage ranking.
☆162Updated 2 years ago
Chenny0808 / ape210k
This is the repository of the Ape210K dataset and baseline models.
☆195Updated 5 years ago
CLUEbenchmark / ZeroCLUE
零样本学习测评基准，中文版
☆57Updated 4 years ago
Langboat / mengzi-zero-shot
NLU & NLG (zero-shot) depend on mengzi-t5-base-mt pretrained model
☆76Updated 3 years ago
yangjianxin1 / LLMPruner
☆311Updated 2 years ago
Hzfinfdu / PLMTuningCompetition
擂台赛3-大规模预训练调优比赛的示例代码与baseline实现
☆37Updated 3 years ago
zhoucz97 / awesome-ChatGPT
ChatGPT相关资源汇总
☆56Updated 2 years ago
bojone / P-tuning
P-tuning方法在中文上的简单实验
☆140Updated 4 years ago
X-PLUG / ChatPLUG
A Chinese Open-Domain Dialogue System
☆324Updated 2 years ago