近代汉语语料库数据集 自然语言处理 语料库 古代汉语 古汉语 文言文 数字人文 计算语言
☆169Mar 4, 2025Updated last year
Alternatives and similar repositories for Pre-modern_Chinese_corpus_dataset
Users that are interested in Pre-modern_Chinese_corpus_dataset are comparing it to the libraries listed below
Sorting:
- 汉语古典文本资料库☆324Feb 3, 2018Updated 8 years ago
- An evaluation bentchmark for classical Chinese☆19Dec 13, 2023Updated 2 years ago
- 一个面向繁体中文古籍分词的python工具包☆36Jan 3, 2022Updated 4 years ago
- 古代汉语资源☆17Feb 25, 2023Updated 3 years ago
- 甲言,专注于古代汉语(古汉语/古文/文言文/文言)处理的NLP工具包,支持文言词库构建、分词、词性标注、断句和标点。Jiayan, the 1st NLP toolkit designed for Classical Chinese, supports lexicon co…☆660Nov 2, 2021Updated 4 years ago
- Ancient Chinese Corpus with Word Sense Annotation☆64May 29, 2024Updated last year
- SikuBERT:四库全书的预训练语言模型(四库BERT) Pre-training Model of Siku Quanshu☆154Jul 30, 2023Updated 2 years ago
- Annotations and code for the EMNLP 2018 paper 'Weeding out Conventionalized Metaphors: A Corpus of Novel Metaphor Annotations'☆10Feb 20, 2023Updated 3 years ago
- 古文语言理解测评基准 Classical Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard☆57Aug 23, 2023Updated 2 years ago
- A step-by-step problem set for implementing a high-quality deep dependency parser in Pytorch☆15Aug 12, 2017Updated 8 years ago
- 中文自然语言处理数据集,平时做做实验的材料。欢迎补充提交合并。☆37Dec 3, 2021Updated 4 years ago
- ☆24Aug 24, 2023Updated 2 years ago
- 贵州大学“水 纹智识·灵境孪生”项目,由杨秀璋团队指导,在冯静、罗恩瑞、郭春山、李灿灿等努力下共同推进。该资源将应用人工智能技术研究水族文化、文字和古籍,已有多所高校参与。为更好的抢救和保护濒危水族文字和非物质文化遗产,作者申请并开源了该项目,主要通过人工智能技术识别水书,构…☆52Jun 30, 2025Updated 8 months ago
- Raw text of 申報☆27Jan 17, 2022Updated 4 years ago
- A Benchmark for Classical Chinese Based on a Crowdsourcing System.☆59May 25, 2021Updated 4 years ago
- Tokenizer POS-tagger and Dependency-parser for Classical Chinese☆15Dec 30, 2025Updated 2 months ago
- Python version for Doug Biber's Multidimensional Analysis (MDA)☆40Nov 13, 2025Updated 4 months ago
- 英文文献的《中国图书馆分类法》自动标注小程序☆13Oct 29, 2024Updated last year
- ☆22Jun 2, 2019Updated 6 years ago
- 非常全的文言文(古文)-现代文平行语料☆1,424Apr 21, 2024Updated last year
- Tokenizer POS-tagger and Dependency-parser for Classical Chinese☆19Feb 28, 2026Updated 3 weeks ago
- The official GitHub repository for AC-EVAL, an ancient Chinese evaluation suite for large language models (LLMs)☆16Nov 12, 2024Updated last year
- 数据字典,汉字字典,汉字库,诗经305首, 人名25万☆25Jul 21, 2022Updated 3 years ago
- 中文 NLP 资源库,语料库,相关的框架,文章收集。☆27May 20, 2022Updated 3 years ago
- A large corpus of Chinese fixed phrases and idioms scraped from a reputable educational website (30310 instances). 一个大型的中文成语及俗语语料库,内含3031…☆11Oct 29, 2021Updated 4 years ago
- classic Chinese punctuate experiment with keras using daizhige(殆知阁古代文献藏书) dataset☆35Dec 8, 2022Updated 3 years ago
- GuwenBERT: 古文预训练语言模型(古文BERT) A Pre-trained Language Model for Classical Chinese (Literary Chinese)☆558Aug 31, 2021Updated 4 years ago
- 大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP☆9,872Feb 6, 2026Updated last month
- 地球上最全的华语现代诗歌语料库,3k+诗人,80K+诗歌,15M+字☆719Sep 12, 2025Updated 6 months ago
- 中國古代基本典籍☆83May 5, 2025Updated 10 months ago
- A toolset for computation and comparison of Chinese dialects☆45Feb 15, 2026Updated last month
- Buddhist Studies Authority Databases☆18Nov 8, 2021Updated 4 years ago
- A tool for text normalisation via character-level machine translation☆13Jun 12, 2020Updated 5 years ago
- <数字人文教程>资源合集☆111May 28, 2024Updated last year
- 中国近现代历史文献选集☆74Oct 28, 2023Updated 2 years ago
- Neural Network Semantic Parser for Almond☆15Apr 11, 2019Updated 6 years ago
- A handwriting font with full support for Hudum Mongolian, Sibe, Manchu and Manchu Ali Gali☆11Jun 26, 2022Updated 3 years ago
- 中文恶意网页检测数据集与检测方法☆21Mar 4, 2025Updated last year
- a Corpus for Classical Chinese Language Event Extraction☆25Nov 11, 2025Updated 4 months ago