yishn / chinese-tokenizerLinks
Tokenizes Chinese texts into words.
☆100Updated 2 years ago
Alternatives and similar repositories for chinese-tokenizer
Users that are interested in chinese-tokenizer are comparing it to the libraries listed below
Sorting:
- HanziJS is a Chinese character and NLP module for Chinese language processing for Node.js☆390Updated last year
- A tool to find grammar patterns in Chinese text☆28Updated 5 years ago
- Text to IPA converter in JavaScript☆58Updated 3 years ago
- Converts from Chinese characters to pinyin, between simplified and traditional, and does word segmentation.☆114Updated 2 years ago
- Analyzes the given text and determine what's the vocabulary level based on CEFR levels☆47Updated 2 years ago
- Han character library for CJKV languages☆163Updated 4 years ago
- 粵文語料篩選器 Cantonese text filter☆41Updated 6 months ago
- Convert a Chinese sentence to Pinyin or Jyutping☆64Updated 2 years ago
- Open Language Profiles — English profile datasets from CEFR-J☆151Updated 5 years ago
- 臺灣閩南語常用詞辭典 資料檔☆80Updated 2 years ago
- Converts English text to IPA notation☆390Updated 2 years ago
- 中文词典 / 中文詞典。Chinese / Chinese-English dictionaries.☆197Updated last year
- 開放漢語字典 - 現代漢 語字音數據庫☆24Updated 4 years ago
- Python module that identifies Chinese text as being Simplified or Traditional☆102Updated 10 months ago
- CLDR text segmentation for JavaScript☆38Updated last year
- English Lemma Database - Compiled by Referencing British National Corpus☆32Updated last year
- cc-kedict: Creative Commons Korean-English Dictionary☆41Updated 4 years ago
- All the words from Google Books, sorted by frequency☆118Updated 2 years ago
- Gather modern English word frequencies from all enwiki articles.☆225Updated last year
- Cantonese Linguistics and NLP☆391Updated last year
- Implement the supermemo 2 algorithm.☆81Updated 3 years ago
- JavaScript Lemmatizer is a lemmatization library to retrieve a base form from an English inflected word.☆67Updated 4 years ago
- A frequency lexicon for Hong Kong Cantonese☆23Updated 5 years ago
- Convert *.LD2 dictionaries format into human-readable text files☆67Updated 12 years ago
- Draw animated Japanese characters (Kanji and Kana), Korean characters (Hanja) and Chinese characters (Hanzi) in correct stroke order usin…☆347Updated last week
- Tool for aligning Chinese transcripts with audio using the AWS transcribe service☆15Updated 3 years ago
- Chinese language vocabulary graph generation. Python/Flask tool that performs dictionary search and analysis on Chinese Hanzi characters.…☆148Updated 2 years ago
- Stroke order SVG files for Chinese Hanzi characters☆44Updated 2 years ago
- 《香港二十世紀中期粵語語料庫》打包器☆16Updated 9 years ago
- A CWN Python binding with graph structure☆35Updated 2 years ago