JiangYanting/Pre-modern_Chinese_corpus_dataset

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/JiangYanting/Pre-modern_Chinese_corpus_dataset)

JiangYanting / Pre-modern_Chinese_corpus_dataset

近代汉语语料库数据集自然语言处理语料库古代汉语古汉语文言文数字人文计算语言

☆173

Alternatives and similar repositories for Pre-modern_Chinese_corpus_dataset

Users that are interested in Pre-modern_Chinese_corpus_dataset are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mahavivo / scripta-sinica
View on GitHub
汉语古典文本资料库
☆350Feb 3, 2018Updated 8 years ago
jiaeyan / Jiayan
View on GitHub
甲言，专注于古代汉语(古汉语/古文/文言文/文言)处理的NLP工具包，支持文言词库构建、分词、词性标注、断句和标点。Jiayan, the 1st NLP toolkit designed for Classical Chinese, supports lexicon co…
☆678Nov 2, 2021Updated 4 years ago
hsc748NLP / SikuBERT-for-digital-humanities-and-classical-Chinese-information-processing
View on GitHub
SikuBERT：四库全书的预训练语言模型（四库BERT） Pre-training Model of Siku Quanshu
☆168Jul 30, 2023Updated 2 years ago
rguthrie3 / DeepDependencyParsingProblemSet
View on GitHub
A step-by-step problem set for implementing a high-quality deep dependency parser in Pytorch
☆15Aug 12, 2017Updated 8 years ago
jaaack-wang / ChineseNLPCorpus
View on GitHub
中文自然语言处理数据集，平时做做实验的材料。欢迎补充提交合并。
☆38Dec 3, 2021Updated 4 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
iris2hu / Chinese-collocation-complexity
View on GitHub
☆24Aug 24, 2023Updated 2 years ago
eastmountyxz / Sui-AIResearch
View on GitHub
贵州大学“水纹智识·灵境孪生”项目，由杨秀璋团队指导，在冯静、罗恩瑞、郭春山、李灿灿等努力下共同推进。该资源将应用人工智能技术研究水族文化、文字和古籍，已有多所高校参与。为更好的抢救和保护濒危水族文字和非物质文化遗产，作者申请并开源了该项目，主要通过人工智能技术识别水书，构…
☆52Jun 30, 2025Updated last year
moss-on-stone / shenbao-txt
View on GitHub
Raw text of 申報
☆27Jan 17, 2022Updated 4 years ago
jizijing / C-CLUE
View on GitHub
A Benchmark for Classical Chinese Based on a Crowdsourcing System.
☆60May 25, 2021Updated 5 years ago
KoichiYasuoka / GuwenCOMBO
View on GitHub
Tokenizer POS-tagger and Dependency-parser for Classical Chinese
☆15Dec 30, 2025Updated 6 months ago
ssharoff / biberpy
View on GitHub
Python version for Doug Biber's Multidimensional Analysis (MDA)
☆41May 24, 2026Updated 2 months ago
yuting-wei / AC-EVAL
View on GitHub
The official GitHub repository for AC-EVAL, an ancient Chinese evaluation suite for large language models (LLMs)
☆17Nov 12, 2024Updated last year
KaijieMo-kj / Ancient-Chinese-Allusion-Resource-Database
View on GitHub
This project provides an Ancient Chinese Allusion Resource Library to facilitate the automatic analysis of allusions in classical texts a…
☆24Dec 27, 2024Updated last year
NiuTrans / Classical-Modern
View on GitHub
非常全的文言文（古文）-现代文平行语料
☆1,467Apr 21, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
KoichiYasuoka / SuPar-Kanbun
View on GitHub
Tokenizer POS-tagger and Dependency-parser for Classical Chinese
☆20Jun 10, 2026Updated last month
ylxie / Classical-Chinese-Poetry-Corpus
View on GitHub
中文古诗词语料库
☆28Sep 1, 2016Updated 9 years ago
WangLaoShi / NLP-Resources-MaterialForChinese
View on GitHub
中文 NLP 资源库，语料库，相关的框架，文章收集。
☆28May 20, 2022Updated 4 years ago
frederick-wang / tongjiazi-resources
View on GitHub
CCL 2023 古汉语通假字语料库的构建及应用研究：通假字资源库
☆29Sep 23, 2023Updated 2 years ago
garychowcmu / daizhigev20
View on GitHub
殆知阁古代文献
☆1,596May 13, 2024Updated 2 years ago
rime-aca / corpus
View on GitHub
古典中文語料庫
☆309Jun 11, 2022Updated 4 years ago
BrikerMan / classic_chinese_punctuate
View on GitHub
classic Chinese punctuate experiment with keras using daizhige(殆知阁古代文献藏书) dataset
☆35Dec 8, 2022Updated 3 years ago
Ethan-yt / guwenbert
View on GitHub
GuwenBERT: 古文预训练语言模型（古文BERT） A Pre-trained Language Model for Classical Chinese (Literary Chinese)
☆566Aug 31, 2021Updated 4 years ago
sheepzh / poetry
View on GitHub
地球上最全的华语现代诗歌语料库，3k+诗人，80K+诗歌，15M+字
☆734Sep 12, 2025Updated 10 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
brightmart / nlp_chinese_corpus
View on GitHub
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
☆9,906Feb 6, 2026Updated 5 months ago
clarinsi / csmtiser
View on GitHub
A tool for text normalisation via character-level machine translation
☆13Jun 12, 2020Updated 6 years ago
hsc748NLP / code-for-digital-humanities-tutorial
View on GitHub
<数字人文教程>资源合集
☆119May 28, 2024Updated 2 years ago
jaaack-wang / Chinese-fixed-phrases-idioms
View on GitHub
A large corpus of Chinese fixed phrases and idioms scraped from a reputable educational website (30310 instances). 一个大型的中文成语及俗语语料库，内含3031…
☆15Oct 29, 2021Updated 4 years ago
sdlyyxy / Chinese-Modern-Contemporary-History-Anthology
View on GitHub
中国近现代历史文献选集
☆79Oct 28, 2023Updated 2 years ago
JiangYanting / Chinese_Malicious_Web_Pages_Dataset_And_Detection
View on GitHub
中文恶意网页检测数据集与检测方法
☆22Mar 4, 2025Updated last year
lancopku / Chinese-Dependency-Treebank-with-Ellipsis
View on GitHub
An Ellipsis-aware Chinese Dependency Treebank for Web Text
☆26May 14, 2018Updated 8 years ago
wainshine / Medical-Names-Corpus
View on GitHub
医疗语料库。医疗机构名语料库。药品本位码。
☆70Mar 27, 2024Updated 2 years ago
Lyn4ever29 / GuwenEE
View on GitHub
a Corpus for Classical Chinese Language Event Extraction
☆25Nov 11, 2025Updated 8 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
colaudiolab / DeepLearning4UTI
View on GitHub
Deep Learning For Ultrasound Tongue Imaging
☆13Dec 17, 2024Updated last year
tangxuemei1995 / CHisIEC
View on GitHub
CHisIEC An Information Extraction Corpus for Ancient Chinese History
☆24Nov 25, 2025Updated 8 months ago
witko0 / kaldifordummies
View on GitHub
Simple automatic speech recognition system based on digits corpora (Polish language), created in Kaldi toolkit. Despite of the language d…
☆11May 29, 2016Updated 10 years ago
falcondai / chinese-char-lm
View on GitHub
explores Chinese language models with sub-character level visual information
☆16Oct 5, 2018Updated 7 years ago
xueyouluo / NER-Deep-Learning
View on GitHub
Using BiLSTM-CRF model for Chinese NER
☆15Mar 1, 2018Updated 8 years ago
pujiaxin33 / JXTransition
View on GitHub
自定义转场动画
☆12Dec 9, 2015Updated 10 years ago
ddbc / Authority-Databases
View on GitHub
Buddhist Studies Authority Databases
☆18Nov 8, 2021Updated 4 years ago