howl-anderson/hanzi_char_featurizer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/howl-anderson/hanzi_char_featurizer)

howl-anderson / hanzi_char_featurizer

汉字字符特征提取器 (featurizer)，提取汉字的特征（发音特征、字形特征）用做深度学习的特征｜ A Chinese character feature extractor, which extracts the features of Chinese characters (pronunciation features, glyph features) as features for deep learning

☆301

Alternatives and similar repositories for hanzi_char_featurizer

Users that are interested in hanzi_char_featurizer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

charlesXu86 / char_featurizer
View on GitHub
汉字字符特征提取工具，可以提取出字符中的字音（声母、韵母、声调）、字形（偏旁、部首）、四角编码等特征，同时可作为tensor输入到模型
☆138May 25, 2020Updated 6 years ago
howl-anderson / hanzi_chaizi
View on GitHub
汉字拆字库，可以将汉字拆解成偏旁部首，在机器学习中作为汉字的字形特征 | Hanzi Decomposition Library allows Chinese characters to be broken down into radicals and components…
☆421Dec 29, 2025Updated 6 months ago
kfcd / chaizi
View on GitHub
漢語拆字字典
☆815Jan 8, 2023Updated 3 years ago
howl-anderson / four_corner_method
View on GitHub
中文「四角号码」数据与工具，可以将汉字拆解成和字形相关的编码，在机器学习中作为汉字的字形特征
☆28Dec 20, 2025Updated 7 months ago
contr4l / SimilarCharacter
View on GitHub
对常用的6700个汉字进行音、形比较，输出音近字、形近字的列表。 # 相近字
☆482Mar 28, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
zhangyics / Chinese-abbreviation-dataset
View on GitHub
This is a corpus of Chinese abbreviation, including negative full forms.
☆198Jul 17, 2021Updated 5 years ago
wenyangchou / SimilarCharactor
View on GitHub
☆55Jun 7, 2021Updated 5 years ago
zhanzecheng / Time_NLP
View on GitHub
Time-NLP的python3版本中文时间表达词转换
☆520Dec 8, 2022Updated 3 years ago
qingyujean / ssc
View on GitHub
基于“音形码”的中文字符串相似度计算方法
☆225Jul 24, 2020Updated 5 years ago
bojone / text_compare
View on GitHub
用python比较两个字符串差异，高亮差异部分
☆27Jul 20, 2020Updated 6 years ago
zedom1 / Error-Detection
View on GitHub
Code for chinese error detection module, using n-gram and bi-lstm
☆136Mar 31, 2019Updated 7 years ago
liuhuanyong / ChineseCixing
View on GitHub
WordForm,针对中文词语的笔画拆解，偏旁查询，拼音转换接口
☆67Aug 26, 2018Updated 7 years ago
liuhuanyong / PersonRelationKnowledgeGraph
View on GitHub
ChinesePersonRelationGraph, person relationship extraction based on nlp methods.中文人物关系知识图谱项目,内容包括中文人物关系图谱构建,基于知识库的数据回标,基于远程监督与bootstrappi…
☆932Dec 15, 2018Updated 7 years ago
liuhuanyong / ChineseSemanticKB
View on GitHub
ChineseSemanticKB,chinese semantic knowledge base, 面向中文处理的12类、百万规模的语义常用词典，包括34万抽象语义库、34万反义语义库、43万同义语义库等，可支持句子扩展、转写、事件抽象与泛化等多种应用场景。
☆783Mar 17, 2023Updated 3 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
iqiyi / FASPell
View on GitHub
2019-SOTA简繁中文拼写检查工具：FASPell Chinese Spell Checker (Chinese Spell Check / 中文拼写检错 / 中文拼写纠错 / 中文拼写检查)
☆1,224Sep 3, 2022Updated 3 years ago
1ytic / edit-distance-papers
View on GitHub
A curated list of papers dedicated to edit-distance as objective function
☆53Aug 22, 2020Updated 5 years ago
sunyilgdx / SIFRank_zh
View on GitHub
Keyphrase or Keyword Extraction 基于预训练模型的中文关键词抽取方法（论文SIFRank: A New Baseline for Unsupervised Keyphrase Extraction Based on Pre-trained La…
☆431May 17, 2020Updated 6 years ago
ShannonAI / glyce
View on GitHub
Code for NeurIPS 2019 - Glyce: Glyph-vectors for Chinese Character Representations
☆425Oct 3, 2023Updated 2 years ago
liuhuanyong / QueryCorrection
View on GitHub
self complemented SpellCorrection based pinyin similairity, edit distance ，基于拼音相似度与编辑距离的查询纠错。
☆83May 20, 2022Updated 4 years ago
Ailln / cn2an
View on GitHub
📦 快速转化「中文数字」和「阿拉伯数字」～ (最新特性：分数，日期、温度等转化）
☆764Apr 23, 2026Updated 2 months ago
liuhuanyong / WordMultiSenseDisambiguation
View on GitHub
WordMultiSenseDisambiguation, chinese multi-wordsense disambiguation based on online bake knowledge base and semantic embedding similarit…
☆131Dec 15, 2018Updated 7 years ago
wainshine / Company-Names-Corpus
View on GitHub
公司名语料库。机构名语料库。公司简称,缩写,品牌词,企业名。可用于中文分词、机构名实体识别。
☆1,294Mar 27, 2024Updated 2 years ago
liuhuanyong / ChineseEmbedding
View on GitHub
Chinese Embedding collection incling token ,postag ,pinyin,dependency,word embedding.中文自然语言处理向量合集,包括字向量,拼音向量,词向量,词性向量,依存关系向量.共5种类型的向量
☆455Dec 15, 2018Updated 7 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
zhpmatrix / nlp-competitions-list-review
View on GitHub
复盘所有NLP比赛的TOP方案，只关注NLP比赛，持续更新中！
☆2,805Apr 4, 2026Updated 3 months ago
mozillazg / phrase-pinyin-data
View on GitHub
词语拼音数据
☆530Jul 20, 2025Updated last year
LG-1 / video_music_book_datasets
View on GitHub
NLP NER datasets video/music/book bio
☆90Jan 3, 2021Updated 5 years ago
brightmart / nlp_chinese_corpus
View on GitHub
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
☆9,904Feb 6, 2026Updated 5 months ago
shibing624 / pycorrector
View on GitHub
pycorrector is a toolkit for text error correction. 文本纠错，实现了Kenlm，T5，MacBERT，ChatGLM3，Qwen2.5等模型应用在纠错场景，开箱即用。
☆6,493Jun 4, 2026Updated last month
zhanlaoban / EDA_NLP_for_Chinese
View on GitHub
An implement of the paper of EDA for Chinese corpus.中文语料的EDA数据增强工具。NLP数据增强。论文阅读笔记。
☆1,383May 31, 2022Updated 4 years ago
brightmart / albert_zh
View on GitHub
A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS, 海量中文预训练ALBERT模型
☆3,979Nov 21, 2022Updated 3 years ago
dalinvip / cw2vec
View on GitHub
cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information
☆274Mar 20, 2023Updated 3 years ago
deadshot465 / novelcrafter-mcp
View on GitHub
An experimental desktop client for using Claude Desktop's MCP with Novelcrafter codices.
☆11Dec 3, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
CLUEbenchmark / CLUEPretrainedModels
View on GitHub
高质量中文预训练模型集合：最先进大模型、最快小模型、相似度专门模型
☆810Jul 8, 2020Updated 6 years ago
quincyliang / nlp-data-augmentation
View on GitHub
Data Augmentation for NLP. NLP数据增强
☆294Dec 10, 2020Updated 5 years ago
didi / ChineseNLP
View on GitHub
Datasets, SOTA results of every fields of Chinese NLP
☆1,806Apr 7, 2022Updated 4 years ago
liuhuanyong / AbstractKnowledgeGraph
View on GitHub
AbstractKnowledgeGraph, a systematic knowledge graph that concentrate on abstract thing including abstract entity and action. 抽象知识图谱，目前规模…
☆248Aug 6, 2019Updated 6 years ago
ZhuiyiTechnology / simbert
View on GitHub
a bert for retrieval and generation
☆860Feb 26, 2021Updated 5 years ago
bojone / ee-2019-baseline
View on GitHub
面向金融领域的事件主体抽取（ccks2019），一个baseline
☆118May 13, 2019Updated 7 years ago
ChineseGLUE / ChineseGLUE
View on GitHub
Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard
☆1,783Feb 18, 2023Updated 3 years ago