yishn / chinese-tokenizer
Tokenizes Chinese texts into words.
☆96Updated 2 years ago
Alternatives and similar repositories for chinese-tokenizer:
Users that are interested in chinese-tokenizer are comparing it to the libraries listed below
- Converts from Chinese characters to pinyin, between simplified and traditional, and does word segmentation.☆106Updated last year
- HanziJS is a Chinese character and NLP module for Chinese language processing for Node.js☆379Updated 4 months ago
- rime-cantonese 上游詞表倉庫☆27Updated 5 months ago
- A tool to find grammar patterns in Chinese text☆26Updated 5 years ago
- 粵文語料篩選器 Cantonese text filter☆38Updated last week
- Chrome extension that translates Chinese words when hovering on them.☆36Updated last year
- 臺灣閩南語常用詞辭典 資料檔☆76Updated last year
- 中文词典 / 中文詞典。Chinese / Chinese-English dictionaries.☆156Updated 10 months ago
- Chinese (zh-cnm) opendata audio files for 8,596 hsk words and 1,707 syllabs.☆43Updated 3 years ago
- Convert a Chinese sentence to Pinyin or Jyutping☆61Updated last year
- 漢語拼音轉換表☆36Updated 3 years ago
- 《国际中文教育中文水平等级标准》 查询系统 Query System of Chinese Proficiency Grading Standards for International Chinese Language Education, New HSK Levels …☆28Updated 10 months ago
- 開放漢語字典 - 現代漢語字音數據庫☆21Updated 4 years ago
- Free, open-source Chinese handwriting recognition in Javascript☆145Updated 5 years ago
- Han character library for CJKV languages☆153Updated 3 years ago
- Chinese lexicon containing definitions, character origins, and statistics, built for Dong Chinese (https://www.dong-chinese.com)☆43Updated 4 years ago
- CLDR text segmentation for JavaScript☆38Updated 9 months ago
- Unconjugate conjugated Japanese verbs.☆23Updated 9 months ago
- Hanzipy is a Chinese character and NLP module for Chinese language processing for python. It is primarily written to help provide a frame…☆18Updated last year
- Python module that identifies Chinese text as being Simplified or Traditional☆89Updated 3 months ago
- JavaScript Lemmatizer is a lemmatization library to retrieve a base form from an English inflected word.☆66Updated 3 years ago
- 台語、族語、客語的語料清單、彙整☆39Updated 4 years ago
- Hong Kong Cantonese Corpus of transcribed speech (spontaneous speech, radio programmes and a monologue).☆52Updated 11 months ago
- Node.js Interface for CC-CEDICT (http://cc-cedict.org/)☆26Updated 7 years ago
- 這棵橡木是松鼠的☆25Updated 8 years ago
- A frequency lexicon for Hong Kong Cantonese☆21Updated 4 years ago
- Draw animated Japanese characters (Kanji and Kana), Korean characters (Hanja) and Chinese characters (Hanzi) in correct stroke order usin…☆309Updated last month
- 開放粵語字典 - 現代粵語字音數據庫☆45Updated last year
- Practice Chinese language grammar☆16Updated 3 years ago
- Information and resources relating to conversion of various open source dictionary data to Pleco user dictionary format.☆28Updated 3 years ago