汉字数据集,包括汉字的相关信息,例如笔画数、部首、拼音、英文释义/同义词等。
☆128Jul 17, 2020Updated 5 years ago
Alternatives and similar repositories for zi-dataset
Users that are interested in zi-dataset are comparing it to the libraries listed below
Sorting:
- Chinese Characters Visualization & Chinese Text Augmentation.☆17Sep 19, 2022Updated 3 years ago
- 汉字构造表☆18Jul 16, 2025Updated 7 months ago
- ☆23Oct 17, 2024Updated last year
- Simple voice activity detection (VAD) algorithm in Python☆15Aug 10, 2023Updated 2 years ago
- noise reduction☆17Jul 3, 2024Updated last year
- (WIP)long form speech generatoins☆31Apr 2, 2025Updated 11 months ago
- ☆36Sep 6, 2025Updated 6 months ago
- [EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E…☆28Jul 11, 2025Updated 7 months ago
- Compute WER and SER for speech recognition evaluation☆26Dec 15, 2025Updated 2 months ago
- npm 库:汉字笔画笔顺☆20Dec 30, 2022Updated 3 years ago
- 汉字拆字库,可以将汉字拆解成偏旁部首,在机器学习中作为汉字的字形特征 | Hanzi Decomposition Library allows Chinese characters to be broken down into radicals and components…☆415Dec 29, 2025Updated 2 months ago
- simple energy vad☆19Jun 3, 2017Updated 8 years ago
- Open Source Speech/Text Data on AI☆19Sep 13, 2022Updated 3 years ago
- Chinese characters code table 全部汉字20902个汉字的全拼|五笔|郑码|UNICODE|GBK|笔画数|部首|笔顺编号等编码☆19Feb 14, 2023Updated 3 years ago
- AudioStretchy is a Python wrapper around the `audio-stretch` C library, which performs fast, high-quality time-stretching of WAV/MP3 file…☆61Sep 24, 2025Updated 5 months ago
- Grapheme-to-Phoneme for Mixed Chinese (Mandarin or Cantonese) and English.☆114Dec 2, 2025Updated 3 months ago
- Text-To-Speech for NotebookLM☆39Jul 20, 2025Updated 7 months ago
- Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications☆87Dec 20, 2024Updated last year
- Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.☆62Sep 5, 2025Updated 6 months ago
- We introduce the LLAMA1 Test Set, a comprehensive open-domain world knowledge QA dataset for evaluating question-answering systems. We pr…☆23Mar 14, 2024Updated last year
- MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music☆26Jan 7, 2026Updated last month
- ☆23Oct 30, 2024Updated last year
- WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models☆27Feb 13, 2026Updated 3 weeks ago
- CTC decoder with hotwords for ASR.☆34Apr 13, 2025Updated 10 months ago
- ☆78Sep 25, 2025Updated 5 months ago
- A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.☆113Jun 4, 2025Updated 9 months ago
- 汉字拼音数据☆1,435Feb 23, 2026Updated last week
- Open, royalty free, lyrics2song / song generation data collection / cleaning pipeline.☆17May 9, 2025Updated 9 months ago
- 早期的计算机使用7位的ASCII编码,为了处理汉字,程序员设计了用于简体中文的GB2312和用于繁体中文的big5。 GB2312(1980年)一共收录了7445个字符,包括6763个汉字和682个其它符号。汉字区的内码范围高字节从B0-F7,低字节从A1-FE,占用的码…☆10Sep 10, 2017Updated 8 years ago
- semantic tokenizer for speech and music☆21Jul 6, 2025Updated 8 months ago
- T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech …☆28Nov 7, 2025Updated 3 months ago
- A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.☆134Sep 19, 2025Updated 5 months ago
- Fetch Myki Balance with Scriptable, add to iOS 14 Widget☆13Dec 9, 2022Updated 3 years ago
- TensorFlow: learn and practice☆11Aug 30, 2018Updated 7 years ago
- Offline Speaker Diarization with SenseVoice by Sherpa ONNX.☆15Dec 23, 2024Updated last year
- Range-based algorithms in Go☆13Sep 10, 2023Updated 2 years ago
- Chinese Word Segmentation task based on BERT and implemented in Pytorch☆14Aug 14, 2020Updated 5 years ago
- Official implementation of the paper titled "Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Mu…☆27Mar 5, 2024Updated 2 years ago
- Speech-To-Text forced-alignment Speech processing Universal PERformance Benchmark☆35May 7, 2025Updated 9 months ago