yishn / chinese-tokenizer
Tokenizes Chinese texts into words.
☆95Updated last year
Related projects ⓘ
Alternatives and complementary repositories for chinese-tokenizer
- HanziJS is a Chinese character and NLP module for Chinese language processing for Node.js☆375Updated last month
- Converts from Chinese characters to pinyin, between simplified and traditional, and does word segmentation.☆104Updated last year
- A tool to find grammar patterns in Chinese text☆24Updated 4 years ago
- CLDR text segmentation for JavaScript☆38Updated 6 months ago
- A JavaScript Chinese word segmentation tool based on Python Jieba☆43Updated 10 years ago
- Split {Japanese, English} text into sentences.☆118Updated 11 months ago
- Chrome extension that translates Chinese words when hovering on them.☆35Updated last year
- English lemmatizer☆65Updated last year
- Python module that identifies Chinese text as being Simplified or Traditional☆86Updated last year
- The 134,000+ words and their pronunciations in the CMU pronouncing dictionary☆67Updated 3 years ago
- English Lemma Database - Compiled by Referencing British National Corpus☆29Updated last month
- Analyzes the given text and determine what's the vocabulary level based on CEFR levels☆42Updated last year
- Multilingual tokenizer that automatically tags each token with its type☆61Updated last year
- 開放漢語字典 - 現代漢語字音數據庫☆21Updated 4 years ago
- Convert a Chinese sentence to Pinyin or Jyutping☆58Updated last year
- Han character library for CJKV languages☆150Updated 3 years ago
- Chinese lexicon containing definitions, character origins, and statistics, built for Dong Chinese (https://www.dong-chinese.com)☆39Updated 4 years ago
- Node.js Interface for CC-CEDICT (http://cc-cedict.org/)☆26Updated 7 years ago
- Free, open-source Chinese handwriting recognition in Javascript☆142Updated 5 years ago
- HSK 3.0 Vocabulary Lists (words and characters)☆71Updated last year
- A Wordnet API in pure JavaScript☆108Updated last year
- Sentence Boundary Detection in javascript for node. http://tessmore.github.io/sbd/☆206Updated last year
- English Part-of-speech (POS) tagger☆65Updated last year
- 開放粵語字典 - 現代粵語字音數據庫☆40Updated last year
- Implement the supermemo 2 algorithm.☆81Updated 2 years ago
- OpenCC implementation for pure Node.js☆59Updated 6 years ago
- Chinese (zh-cnm) opendata audio files for 8,596 hsk words and 1,707 syllabs.☆44Updated 3 years ago
- The JavaScript version of Open Chinese Convert (OpenCC)☆247Updated last year
- Python library for CJK (Chinese, Japanese, and Korean) language dictionary☆82Updated last week
- JavaScript Lemmatizer is a lemmatization library to retrieve a base form from an English inflected word.☆65Updated 3 years ago