secsilm/zi-dataset

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/secsilm/zi-dataset)

secsilm / zi-dataset

汉字数据集，包括汉字的相关信息，例如笔画数、部首、拼音、英文释义/同义词等。

☆130

Alternatives and similar repositories for zi-dataset

Users that are interested in zi-dataset are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

chilingg / hanzi-jiegou
View on GitHub
汉字构造表
☆18Updated this week
Kybs0 / HanziDictionary
View on GitHub
获取汉字字典的所有数数据-拼音/部首/笔画/笔顺/五笔/解释
☆25Aug 18, 2018Updated 7 years ago
pengzhendong / audio-pipeline
View on GitHub
☆23Oct 17, 2024Updated last year
Mddct / simple-tts
View on GitHub
（WIP）long form speech generatoins
☆30Apr 2, 2025Updated last year
MorenoLaQuatra / vad
View on GitHub
Simple voice activity detection (VAD) algorithm in Python
☆15Aug 10, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
dengcunqin / noise-reduction
View on GitHub
noise reduction
☆17Jul 3, 2024Updated 2 years ago
howl-anderson / hanzi_chaizi
View on GitHub
汉字拆字库，可以将汉字拆解成偏旁部首，在机器学习中作为汉字的字形特征 | Hanzi Decomposition Library allows Chinese characters to be broken down into radicals and components…
☆423Dec 29, 2025Updated 6 months ago
Mddct / transformer-vocos
View on GitHub
☆35Sep 6, 2025Updated 10 months ago
open-chinese / chinese-word-structure
View on GitHub
研究所有汉字的结构，为NLP中汉字结构问题提供完备的解。
☆19Apr 7, 2024Updated 2 years ago
iioSnail / MDCSpell_pytorch
View on GitHub
非官方的MDCSpell论文的实现
☆18Oct 16, 2022Updated 3 years ago
pengzhendong / audiolab
View on GitHub
A streaming audio reader, processor, and writer built on top of soundfile, and PyAV (bindings for FFmpeg)
☆39Mar 31, 2026Updated 3 months ago
CNMan / HYDZD
View on GitHub
《汉语大字典》字头检索表
☆20Nov 29, 2022Updated 3 years ago
pengzhendong / wavesurfer
View on GitHub
For audio visualization and playback in Jupyter notebooks.
☆18Nov 25, 2025Updated 8 months ago
kfcd / chaizi
View on GitHub
漢語拆字字典
☆816Jan 8, 2023Updated 3 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
cdtym / digital-table-of-general-standard-chinese-characters
View on GitHub
Digitalization of the Table of General Standard Chinese Characters
☆52Dec 15, 2024Updated last year
CNMan / XDHYDCD
View on GitHub
《现代汉语大词典》字词头
☆29Dec 29, 2020Updated 5 years ago
changmenseng / accept_prob
View on GitHub
Calculate the probability of a paper being accepted by EMNLP2023 based on score distribution of ACL2023.
☆14Sep 7, 2023Updated 2 years ago
wenet-e2e / WeSpeech-AI
View on GitHub
Open Source Speech/Text Data on AI
☆19Sep 13, 2022Updated 3 years ago
whmnoe4j / work12
View on GitHub
早期的计算机使用7位的ASCII编码，为了处理汉字，程序员设计了用于简体中文的GB2312和用于繁体中文的big5。 GB2312(1980年)一共收录了7445个字符，包括6763个汉字和682个其它符号。汉字区的内码范围高字节从B0-F7，低字节从A1-FE，占用的码…
☆10Sep 10, 2017Updated 8 years ago
lifeiteng / NotebookTTS
View on GitHub
Text-To-Speech for NotebookLM
☆39Jul 20, 2025Updated last year
pengzhendong / torchfa
View on GitHub
Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.
☆61Sep 5, 2025Updated 10 months ago
k2-fsa / sherpa-mlx
View on GitHub
sherpa with mlx
☆15Aug 2, 2025Updated 11 months ago
colaudiolab / AudioSet-R
View on GitHub
Official implementation: "AudioSet-R: A Refined AudioSet with Multi-Stage LLM Label Reannotation"
☆19Oct 9, 2025Updated 9 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
WTree / chineseStroke
View on GitHub
汉字笔画库
☆86Jan 8, 2021Updated 5 years ago
ben-hua / general_standard_chinese
View on GitHub
通用规范汉字表+拼音+笔画+部首+五行
☆27Jun 29, 2024Updated 2 years ago
buptlj / learn_tf
View on GitHub
TensorFlow: learn and practice
☆11Aug 30, 2018Updated 7 years ago
pengzhendong / ngram-punctuator
View on GitHub
An N-gram punctuator for Chinese and English.
☆18Oct 14, 2025Updated 9 months ago
houbb / nlp-hanzi-similar
View on GitHub
The hanzi similar tool.(汉字相似度计算工具，中文形近字算法。可用于手写汉字识别纠正，文本混淆等。)
☆298Feb 28, 2024Updated 2 years ago
kanekomasahiro / eb-gec
View on GitHub
☆15Mar 15, 2022Updated 4 years ago
pengzhendong / compute-wer
View on GitHub
Compute WER and SER for speech recognition evaluation
☆27Jun 6, 2026Updated last month
yefeijiang / Chinese-characters-code-table
View on GitHub
Chinese characters code table 全部汉字20902个汉字的全拼|五笔|郑码|UNICODE|GBK|笔画数|部首|笔顺编号等编码
☆19Feb 14, 2023Updated 3 years ago
hhguo / SoCodec
View on GitHub
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
☆92Dec 20, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
pengzhendong / g2p-mix
View on GitHub
Grapheme-to-Phoneme for Mixed Chinese (Mandarin or Cantonese) and English.
☆115Dec 2, 2025Updated 7 months ago
Riccorl / chinese-word-segmentation-pytorch
View on GitHub
Chinese Word Segmentation task based on BERT and implemented in Pytorch
☆14Aug 14, 2020Updated 5 years ago
lifeiteng / Aligner-SUPERB
View on GitHub
Speech-To-Text forced-alignment Speech processing Universal PERformance Benchmark
☆39May 7, 2025Updated last year
Bartelds / ctc-dro
View on GitHub
Code associated with the paper: CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition.
☆17May 16, 2025Updated last year
Lee-xeo / Chinese-Character-Stroke-Sequence-Dataset
View on GitHub
按照汉字笔画顺序依次展示的图片数据集
☆33May 24, 2024Updated 2 years ago
pengzhendong / streaming-ChatTTS
View on GitHub
☆23Oct 30, 2024Updated last year
mapull / chinese-dictionary
View on GitHub
中文汉语拼音辞典，汉字拼音字典，词典，成语词典，常用字、多音字字典数据库
☆797Feb 4, 2025Updated last year