yishn / chinese-tokenizer
Tokenizes Chinese texts into words.
☆95Updated last year
Related projects ⓘ
Alternatives and complementary repositories for chinese-tokenizer
- A tool to find grammar patterns in Chinese text☆24Updated 4 years ago
- CLDR text segmentation for JavaScript☆38Updated 6 months ago
- Converts from Chinese characters to pinyin, between simplified and traditional, and does word segmentation.☆103Updated last year
- HanziJS is a Chinese character and NLP module for Chinese language processing for Node.js☆375Updated last month
- Split {Japanese, English} text into sentences.☆118Updated 11 months ago
- Chrome extension that translates Chinese words when hovering on them.☆35Updated last year
- 臺灣閩南語常用詞辭典 資料檔☆76Updated last year
- English Lemma Database - Compiled by Referencing British National Corpus☆29Updated last month
- 開放漢語字典 - 現代漢語字音數據庫☆21Updated 4 years ago
- 開放粵語字典 - 現代粵語字音數據庫☆40Updated last year
- Python module that identifies Chinese text as being Simplified or Traditional☆86Updated this week
- A JavaScript Chinese word segmentation tool based on Python Jieba☆43Updated 10 years ago
- English lemmatizer☆65Updated last year
- Free, open-source Chinese handwriting recognition in Javascript☆142Updated 5 years ago
- 中文词典 / 中文詞典。Chinese / Chinese-English dictionaries.☆144Updated 7 months ago
- 漢字データベースの辞書関連データ☆88Updated last year
- Cantonese Romanization Converter☆14Updated 3 years ago
- Han character library for CJKV languages☆150Updated 3 years ago
- Chinese (zh-cnm) opendata audio files for 8,596 hsk words and 1,707 syllabs.☆43Updated 3 years ago
- 教育部重編國語辭典 資料檔; 若有建議或 bug 請在 moedict-process 反應☆134Updated last year
- The 134,000+ words and their pronunciations in the CMU pronouncing dictionary☆67Updated 3 years ago
- Offline bilingual dictionaries made using data from Wiktionary☆52Updated 9 years ago
- WebAssembly based Javascript bindings for google Compact Language Detector v3☆58Updated 10 months ago
- Analyzes the given text and determine what's the vocabulary level based on CEFR levels☆43Updated last year
- 中華大辭典☆113Updated last year
- Node module wrapper for WordNet dictionary.☆50Updated 2 years ago
- 這棵橡木是松鼠的☆25Updated 8 years ago
- JavaScript Lemmatizer is a lemmatization library to retrieve a base form from an English inflected word.☆65Updated 3 years ago
- Python library for CJK (Chinese, Japanese, and Korean) language dictionary☆82Updated this week
- *.mdx/*.mdd interpreter js implements, support mdict index file☆159Updated last month