mattzheng/ChineseWiki

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mattzheng/ChineseWiki)

mattzheng / ChineseWiki

维基百科中文语料整理

☆304

Alternatives and similar repositories for ChineseWiki

Users that are interested in ChineseWiki are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

brightmart / nlp_chinese_corpus
View on GitHub
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
☆9,906Feb 6, 2026Updated 5 months ago
dalinvip / corpus_process_script
View on GitHub
chinese and english corpus process script, python, c++, java
☆198Jan 22, 2019Updated 7 years ago
nocoolsandwich / iamQA
View on GitHub
中文wiki百科QA阅读理解问答系统，使用了CCKS2016数据的NER模型和CMRC2018的阅读理解模型，还有W2V词向量搜索,使用torchserve部署
☆90Jun 4, 2021Updated 5 years ago
Embedding / Chinese-Word-Vectors
View on GitHub
100+ Chinese Word Vectors 上百种预训练中文词向量
☆12,230Oct 30, 2023Updated 2 years ago
InsaneLife / ChineseNLPCorpus
View on GitHub
中文自然语言处理数据集，平时做做实验的材料。欢迎补充提交合并。
☆4,605Nov 21, 2023Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
CLUEbenchmark / CLUE
View on GitHub
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
☆4,273Feb 6, 2026Updated 5 months ago
ymcui / Chinese-RC-Datasets
View on GitHub
Collections of Chinese reading comprehension datasets
☆221Dec 19, 2019Updated 6 years ago
CLUEbenchmark / CLUEDatasetSearch
View on GitHub
搜索所有中文NLP数据集，附常用英文NLP数据集
☆4,459Nov 21, 2022Updated 3 years ago
CLUEbenchmark / CLUECorpus2020
View on GitHub
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
☆1,016Feb 6, 2026Updated 5 months ago
panchunguang / ccks_baidu_entity_link
View on GitHub
ccks baidu entity link 实体链接第一名
☆841Dec 19, 2023Updated 2 years ago
thunlp / sememe_prediction
View on GitHub
Codes for Lexical Sememe Prediction via Word Embeddings and Matrix Factorization (IJCAI 2017).
☆60Dec 23, 2019Updated 6 years ago
kliegr / word_similarity_relatedness_datasets
View on GitHub
☆12Jul 19, 2018Updated 8 years ago
ymcui / cmrc2019
View on GitHub
A Sentence Cloze Dataset for Chinese Machine Reading Comprehension (CMRC 2019)
☆126Oct 24, 2022Updated 3 years ago
ymcui / Chinese-BERT-wwm
View on GitHub
Pre-Training with Whole Word Masking for Chinese BERT（中文BERT-wwm系列模型）
☆10,224Apr 19, 2026Updated 3 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
howl-anderson / chinese-wikipedia-corpus-creator
View on GitHub
Corpus creator for Chinese Wikipedia
☆41Jun 30, 2021Updated 5 years ago
xiulonghan / wordSeg
View on GitHub
☆15Mar 19, 2017Updated 9 years ago
neukg / MultiIE
View on GitHub
The source code of paper "An Effective System for Multi-format Information Extraction".
☆18Aug 14, 2021Updated 4 years ago
pkumod / CKBQA
View on GitHub
A Chinese KBQA dataset with SPARQL annotations.
☆143Aug 26, 2019Updated 6 years ago
TJUNLP / COER
View on GitHub
Chinese Open Entity-Relation Knowledge Base
☆36May 22, 2018Updated 8 years ago
BAAI-WuDao / P-tuning
View on GitHub
Finetune CPM-1
☆24Jun 20, 2021Updated 5 years ago
chatstack-ai / Chatstack-Doc
View on GitHub
Documentation for Chatstack: A Full Pipeline UI for building Chinese NLU System
☆18Sep 7, 2019Updated 6 years ago
SophonPlus / ChineseNlpCorpus
View on GitHub
搜集、整理、发布中文自然语言处理语料/数据集，与有志之士共同促进中文自然语言处理的发展。
☆6,592Jan 29, 2019Updated 7 years ago
ymcui / Chinese-XLNet
View on GitHub
Pre-Trained Chinese XLNet（中文XLNet预训练模型）
☆1,647Apr 19, 2026Updated 3 months ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
nlpformyself / rc_tf
View on GitHub
我的百度机器阅读理解竞赛模型代码，获得 final 第三名
☆14Jul 26, 2018Updated 7 years ago
Luka0612 / JEAR
View on GitHub
Joint Extraction of Entity Mentions and Relations without Dependency Trees
☆18Jul 14, 2018Updated 8 years ago
wzpan / scnuthesis
View on GitHub
符合华南师范大学硕士/博士学位论文格式要求的LaTeX模板。
☆14Jan 22, 2015Updated 11 years ago
brightmart / roberta_zh
View on GitHub
RoBERTa中文预训练模型: RoBERTa for Chinese
☆2,793Jul 22, 2024Updated 2 years ago
castorini / TrecQA-NegEx
View on GitHub
Code and dataset for SIGIR 2017 short paper "Automatically Extracting High-Quality Negative Examples for Answer Selection in Question Ans…
☆10Aug 1, 2017Updated 8 years ago
dbiir / UER-py
View on GitHub
Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
☆3,110May 9, 2024Updated 2 years ago
shibing624 / pycorrector
View on GitHub
pycorrector is a toolkit for text error correction. 文本纠错，实现了Kenlm，T5，MacBERT，ChatGLM3，Qwen2.5等模型应用在纠错场景，开箱即用。
☆6,495Jun 4, 2026Updated last month
ymcui / Chinese-ELECTRA
View on GitHub
Pre-trained Chinese ELECTRA（中文ELECTRA预训练模型）
☆1,433Apr 19, 2026Updated 3 months ago
thu-coai / EVA
View on GitHub
EVA: Large-scale Pre-trained Chit-Chat Models
☆304Mar 11, 2023Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
pluto-junzeng / ChineseSquad
View on GitHub
中文机器阅读理解数据集
☆108Mar 29, 2021Updated 5 years ago
candlewill / Dialog_Corpus
View on GitHub
用于训练中英文对话系统的语料库 Datasets for Training Chatbot System
☆2,052Sep 23, 2020Updated 5 years ago
mnhng / HeadFilt
View on GitHub
☆12Oct 10, 2021Updated 4 years ago
LianjiaTech / BELLE
View on GitHub
BELLE: Be Everyone's Large Language model Engine（开源中文对话大模型）
☆8,277Oct 16, 2024Updated last year
huangxiangzhou / NLPCC2016KBQA
View on GitHub
KBQA based on the NLPCC2016 dataset, including reimplementation of NLPCC2016 best team`s QA.
☆314Feb 17, 2019Updated 7 years ago
brightmart / albert_zh
View on GitHub
A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS, 海量中文预训练ALBERT模型
☆3,980Nov 21, 2022Updated 3 years ago
loujie0822 / DeepIE
View on GitHub
DeepIE: Deep Learning for Information Extraction
☆1,937Dec 9, 2022Updated 3 years ago