CASIA-LM / ChineseWebText-2.0Links

Large-Scale High-quality Chinese Web Text with Multi-dimensional and fine-grained information

☆38

Alternatives and similar repositories for ChineseWebText-2.0

Users that are interested in ChineseWebText-2.0 are comparing it to the libraries listed below

Sorting:

mutonix / RefGPT
☆98Updated last year
CASIA-LM / ChineseWebText
☆184Updated 2 years ago
THUIR / T2Ranking
T2Ranking: A large-scale Chinese benchmark for passage ranking.
☆162Updated 2 years ago
xlxwalex / FCGEC
The Corpus & Code for EMNLP 2022 paper "FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction" | FCGEC中文语法纠错语料及STG模型
☆120Updated last year
aplmikex / deduplication_mnbvc
文本去重
☆77Updated last year
FlagOpen / FlagInstruct
☆173Updated 2 years ago
Felixgithub2017 / MMCU
MEASURING MASSIVE MULTITASK CHINESE UNDERSTANDING
☆89Updated last year
sufengniu / RefGPT
☆164Updated 2 years ago
BAAI-Zlab / COIG
☆129Updated 2 years ago
RUCKBReasoning / GLM-Dialog
☆59Updated 2 years ago
OpenBMB / DecT
Source code for ACL 2023 paper Decoder Tuning: Efﬁcient Language Understanding as Decoding
☆51Updated 2 years ago
MikeGu721 / XiezhiBenchmark
☆99Updated 2 years ago
llmeval / LLMEval-1
中文大语言模型评测第一期
☆110Updated 2 years ago
FudanNLPLAB / CBook-150K
中文图书语料MD5链接
☆218Updated 2 years ago
thu-coai / CritiqueLLM
☆147Updated last year
tjunlp-lab / M3KE
A Massive Multi-Level Multi-Subject Knowledge Evaluation benchmark
☆104Updated 2 years ago
IronBeliever / CaR
Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation
☆90Updated last year
llmeval / LLMEval-2
中文大语言模型评测第二期
☆71Updated 2 years ago
Claude-Liu / ReLM
Rephrasing Language Model for CSC (AAAI 2024)
☆44Updated last year
BAAI-WuDao / Data
“悟道”数据
☆50Updated 4 years ago
OpenMOSS / HalluQA
Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"
☆136Updated last year
TsinghuaAI / CUGE
☆54Updated 3 years ago
fanqiwan / KCA
EMNLP'2024: Knowledge Verification to Nip Hallucination in the Bud
☆23Updated last year
Isaac-JL-Chen / rouge_chinese
Python ROUGE Score Implementation for Chinese Language Task (official rouge score)
☆111Updated last year
HillZhang1999 / NaSGEC
Code & Data for our Paper "NaSGEC: Multi-Domain Chinese Grammatical Error Correction for Native Speaker Texts" (ACL 2023 Findings)
☆96Updated 11 months ago
blcuicall / CCL2022-CLTC
CCL 2022 汉语学习者文本纠错评测
☆142Updated 3 years ago
blcuicall / cged_datasets
历届中文句法错误诊断技术评测数据集
☆43Updated 3 years ago
DAMO-NLP-MT / PolyLM
☆78Updated 2 years ago
beichao1314 / Open-Llama
The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.
☆67Updated 2 years ago
zjunlp / IEPile
[ACL 2024] IEPile: A Large-Scale Information Extraction Corpus
☆210Updated last year