yishn / chinese-tokenizerLinks
Tokenizes Chinese texts into words.
☆98Updated 2 years ago
Alternatives and similar repositories for chinese-tokenizer
Users that are interested in chinese-tokenizer are comparing it to the libraries listed below
Sorting:
- HanziJS is a Chinese character and NLP module for Chinese language processing for Node.js☆384Updated 9 months ago
- Converts from Chinese characters to pinyin, between simplified and traditional, and does word segmentation.☆111Updated 2 years ago
- A tool to find grammar patterns in Chinese text☆27Updated 5 years ago
- CLDR text segmentation for JavaScript☆38Updated last year
- Python module that identifies Chinese text as being Simplified or Traditional☆96Updated 7 months ago
- JavaScript Lemmatizer is a lemmatization library to retrieve a base form from an English inflected word.☆66Updated 3 years ago
- English Lemma Database - Compiled by Referencing British National Corpus☆31Updated 9 months ago
- 臺灣閩南語常用詞辭典 資料檔☆79Updated 2 years ago
- Han character library for CJKV languages☆159Updated 4 years ago
- An experimental webpage for observing Chinese natural language processing. It demonstrates the processes of decomposition, transformation…☆64Updated last year
- Google TTS (Text-To-Speech) for node.js☆286Updated 2 years ago
- 中文词典 / 中文詞典。Chinese / Chinese-English dictionaries.☆174Updated last year
- Analyzes the given text and determine what's the vocabulary level based on CEFR levels☆46Updated 2 years ago
- Text to IPA converter in JavaScript☆57Updated 2 years ago
- Draw animated Japanese characters (Kanji and Kana), Korean characters (Hanja) and Chinese characters (Hanzi) in correct stroke order usin…☆330Updated 5 months ago
- 開放漢語字典 - 現代漢語字音數據庫☆23Updated 4 years ago
- The 134,000+ words and their pronunciations in the CMU pronouncing dictionary☆79Updated 3 years ago
- 粵文語料篩選器 Cantonese text filter☆40Updated 3 months ago
- A JavaScript Chinese word segmentation tool based on Python Jieba☆47Updated 11 years ago
- OpenCC implementation for pure Node.js☆63Updated 7 years ago
- A simple API access to the handwriting recognition service of Google IME☆162Updated last year
- FastText for Node.js☆196Updated 2 years ago
- Stream-based library for parsing and manipulating subtitle files☆418Updated 5 months ago
- Node.js Interface for CC-CEDICT (http://cc-cedict.org/)☆27Updated 8 years ago
- 台語、族語、客語的語料清單、彙整☆42Updated 5 years ago
- Sentence Boundary Detection in javascript for node. http://tessmore.github.io/sbd/☆214Updated last year
- Open Language Profiles — English profile datasets from CEFR-J☆134Updated 5 years ago
- WordNet in JSON format.☆91Updated 4 years ago
- cc-kedict: Creative Commons Korean-English Dictionary☆41Updated 3 years ago
- Gather modern English word frequencies from all enwiki articles.☆218Updated last year