open-chinese / chinese-word-structureLinks
研究所有汉字的结构,为NLP中汉字结构问题提供完备的解。
☆16Updated last year
Alternatives and similar repositories for chinese-word-structure
Users that are interested in chinese-word-structure are comparing it to the libraries listed below
Sorting:
- 漢語拆字字典☆788Updated 2 years ago
- 汉字拆字库,可以将汉字拆解成偏旁部首,在机器学习中作为汉字的字形特征 | Hanzi Decomposition Library allows Chinese characters to be broken down into radicals and components…☆386Updated 9 months ago
- 对常用的6700个汉字进行音、形比较,输出音近字、形近字的列表。 # 相近字☆463Updated last year
- 汉字自动拆分系统开发☆102Updated last year
- Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料☆977Updated 2 years ago
- IDS data for CJK Unified Ideographs☆452Updated 2 years ago
- 中文汉语拼音辞典,汉字拼音字典,词典,成语词典,常用字、多音字字典数据库☆644Updated 6 months ago
- The hanzi similar tool.(汉字相似度计算工具,中文形近字算法。可用于手写汉字识别纠正,文本混淆等。)☆271Updated last year
- Simple conversion and localization between simplified and traditional Chinese using tables from MediaWiki.☆544Updated last year
- 拼音转汉字, 拼音输入法引擎, pin yin -> 拼音☆619Updated 3 months ago
- 汉字数据集,包括汉字的相关信息,例如笔画数、部首、拼音、英文释义/同义词等。☆122Updated 5 years ago
- 基于ChineseAlpaca微调的,专精与古汉语翻译、古汉语断句的大语言模型☆20Updated last year
- zi2zi implement with pytorch☆212Updated last year
- 获取中文的笔画向量☆27Updated 3 years ago
- 使用GAN生成汉字字体☆88Updated 2 years ago
- 基于“音形码”的中文字符串相似度计算方法☆227Updated 5 years ago
- 《现代汉语词典》(第7版)全文TXT☆282Updated last year
- THUOCL(THU Open Chinese Lexicon)中文词库☆958Updated 2 years ago
- 西方学者普遍从汉字部件出发理解汉字,该库给出了中文部件分解的详细说明和数据库。☆11Updated 2 years ago
- 近代汉语语料库数据集 自然语言处理 语料库 古代汉语 古汉语 文言文 数字人文 计算语言☆165Updated 5 months ago
- 中文繁体和简体字符对照表☆56Updated 6 months ago
- GuwenBERT: 古文预训练语言模型(古文BERT) A Pre-trained Language Model for Classical Chinese (Literary Chinese)☆538Updated 3 years ago
- 中文自然语言处理数据集,平时做做实验的材料。欢迎补充提交合并。☆36Updated 3 years ago
- 汉字字符特征提取器 (featurizer),提取汉字的特征(发音特征、字形特征)用做深度学习的特征 | A Chinese character feature extractor, which extracts the features of Chinese charac…☆297Updated 4 years ago
- MuCGEC中文纠错数据集及文本纠错SOTA模型开源;Code & Data for our NAACL 2022 Paper "MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Gr…☆553Updated 2 years ago
- 甲言,专注于古代汉语(古汉语/古文/文言文/文言)处理的NLP工具包,支持文言词库构建、分词、词性标注、断句和标点。Jiayan, the 1st NLP toolkit designed for Classical Chinese, supports lexicon co…☆621Updated 3 years ago
- 维基百科中文语料整理☆298Updated 7 years ago
- 中文停用词/常用汉字/生僻字集合☆172Updated 6 years ago
- 古文语言理解测评基准 Classical Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard☆53Updated last year
- MiniRBT (中文小型预训练模型系列)☆287Updated last month