open-chinese / chinese-word-structureLinks
研究所有汉字的结构,为NLP中汉字结构问题提供完备的解。
☆16Updated last year
Alternatives and similar repositories for chinese-word-structure
Users that are interested in chinese-word-structure are comparing it to the libraries listed below
Sorting:
- 汉字拆字库,可以将汉字拆解成偏旁部首,在机器学习中作为汉字的字形特征 | Hanzi Decomposition Library allows Chinese characters to be broken down into radicals and components…☆389Updated 10 months ago
- 漢語拆字字典☆792Updated 2 years ago
- 获取中文的笔画向量☆27Updated 3 years ago
- IDS data for CJK Unified Ideographs☆456Updated 2 years ago
- 对常用的6700个汉字进行音、形比较,输出音近字、形近字的列表。 # 相近字☆465Updated last year
- The hanzi similar tool.(汉字相似度计算工具,中文形近字算法。可用于手写汉字识别纠正,文本混淆等。)☆272Updated last year
- 汉字自动拆分系统开发☆102Updated last year
- GuwenBERT: 古文预训练语言模型(古文BERT) A Pre-trained Language Model for Classical Chinese (Literary Chinese)☆537Updated 4 years ago
- ☆15Updated 6 months ago
- Simple conversion and localization between simplified and traditional Chinese using tables from MediaWiki.☆545Updated last year
- 甲言,专注于古代汉语(古汉语/古文/文言文/文言)处理的NLP工具包,支持文言词库构建、分词、词性标注、断句和标点。Jiayan, the 1st NLP toolkit designed for Classical Chinese, supports lexicon co…☆621Updated 3 years ago
- zi2zi implement with pytorch☆212Updated last year
- Source code for the paper "Improving Chinese Spelling Check by Character Pronunciation Prediction: The Effects of Adaptivity and Granular…☆41Updated 2 years ago
- Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料☆982Updated 2 years ago
- 近代汉语语料库数据集 自然语言处理 语料库 古代汉语 古汉语 文言文 数字人文 计算语言☆165Updated 6 months ago
- 拼音转汉字, 拼音输入法引擎, pin yin -> 拼音☆625Updated 4 months ago
- MuCGEC中文纠错数据集及文本纠错SOTA模型开源;Code & Data for our NAACL 2022 Paper "MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Gr…☆554Updated 2 years ago
- 基于ChineseAlpaca微调的,专精与古汉语翻译、古汉语断句的大语言模型☆20Updated 2 years ago
- 古文语言理解测评基准 Classical Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard☆54Updated 2 years ago
- 中文繁体和简体字符对照表☆59Updated 7 months ago
- 《现代汉语词典》(第7版)全文TXT☆283Updated last year
- 中文自然语言处理数据集,平时做做实验的材料。欢迎补充提交合并。☆36Updated 3 years ago
- 西方学者普遍从汉字部件出发理解汉字,该库给出了中文部件分解的详细说明和数据库。☆11Updated 2 years ago
- 汉字字符特征提取器 (featurizer),提取汉字的特征 (发音特征、字形特征)用做深度学习的特征 | A Chinese character feature extractor, which extracts the features of Chinese charac…☆298Updated 4 years ago
- GuwenModels: 古文自然语言处理模型合集, 收录互联网上的古文相关模型及资源. A collection of Classical Chinese natural language processing models, including Classical Ch…☆185Updated last year
- This repository contains datasets and baselines for benchmarking Chinese text recognition.☆493Updated 2 years ago
- text correction papers☆310Updated last year
- THUOCL(THU Open Chinese Lexicon)中文词库☆962Updated 2 years ago
- Instance Segmentation for Chinese Character Stroke Extraction, Datasets and Benchmarks.☆83Updated 2 years ago
- 利用语言模型,纠正OCR识别错误☆469Updated 2 years ago