yishn / chinese-tokenizer
Tokenizes Chinese texts into words.
☆97Updated 2 years ago
Alternatives and similar repositories for chinese-tokenizer:
Users that are interested in chinese-tokenizer are comparing it to the libraries listed below
- A tool to find grammar patterns in Chinese text☆27Updated 5 years ago
- HanziJS is a Chinese character and NLP module for Chinese language processing for Node.js☆381Updated 7 months ago
- Converts from Chinese characters to pinyin, between simplified and traditional, and does word segmentation.☆112Updated last year
- Convert a Chinese sentence to Pinyin or Jyutping☆64Updated 2 years ago
- A JavaScript Chinese word segmentation tool based on Python Jieba☆45Updated 11 years ago
- JavaScript Lemmatizer is a lemmatization library to retrieve a base form from an English inflected word.☆66Updated 3 years ago
- CLDR text segmentation for JavaScript☆38Updated last year
- Split {Japanese, English} text into sentences.☆125Updated last year
- 粵文語料篩選器 Cantonese text filter☆40Updated last month
- A frequency lexicon for Hong Kong Cantonese☆22Updated 4 years ago
- Chrome extension that translates Chinese words when hovering on them.☆40Updated 2 years ago
- Han character library for CJKV languages☆158Updated 4 years ago
- Python module that identifies Chinese text as being Simplified or Traditional☆91Updated 5 months ago
- rime-cantonese 上游詞表倉庫☆28Updated 8 months ago
- 中文词典 / 中文詞典。Chinese / Chinese-English dictionaries.☆169Updated last year
- 臺灣閩南語常用詞辭典 資料檔☆78Updated 2 years ago
- Text to IPA converter in JavaScript☆56Updated 2 years ago
- Implement the supermemo 2 algorithm.☆81Updated 2 years ago
- Generate decks for Anki (spaced repetition software)☆164Updated 2 years ago
- 萌典一書☆22Updated 5 years ago
- 漢語拼音轉換表☆39Updated 4 years ago
- Contains HSK 3.0 (HSK 1 to HSK 9) Hanzi, Handwritten, Words and Grammar list, also contains list for Anki decks, with frequency, pinyin, …☆127Updated last year
- 開放漢語字典 - 現代漢語字音數據庫☆22Updated 4 years ago
- Draw animated Japanese characters (Kanji and Kana), Korean characters (Hanja) and Chinese characters (Hanzi) in correct stroke order usin…☆321Updated 3 months ago
- A curated list of resources dedicated to Natural Language Processing (NLP) of Cantonese | 粵語 NLP☆89Updated 3 years ago
- The 134,000+ words and their pronunciations in the CMU pronouncing dictionary☆79Updated 3 years ago
- 教育部重編國語辭典 資料檔; 若有建議或 bug 請在 moedict-process 反應☆141Updated 2 years ago
- Convert Chinese text to Pinyin or Jyutping☆27Updated last year
- Node.js Interface for CC-CEDICT (http://cc-cedict.org/)☆26Updated 8 years ago
- Stroke order SVG files for Chinese Hanzi characters☆40Updated last year