yishn / chinese-tokenizerLinks
Tokenizes Chinese texts into words.
☆100Updated 3 years ago
Alternatives and similar repositories for chinese-tokenizer
Users that are interested in chinese-tokenizer are comparing it to the libraries listed below
Sorting:
- HanziJS is a Chinese character and NLP module for Chinese language processing for Node.js☆402Updated last year
- A tool to find grammar patterns in Chinese text☆28Updated 6 years ago
- Converts from Chinese characters to pinyin, between simplified and traditional, and does word segmentation.☆117Updated 2 years ago
- English Lemma Database - Compiled by Referencing British National Corpus☆35Updated last year
- Python module that identifies Chinese text as being Simplified or Traditional☆105Updated last year
- A JavaScript Chinese word segmentation tool based on Python Jieba☆51Updated 12 years ago
- All the words from Google Books, sorted by frequency☆126Updated 2 years ago
- Han character library for CJKV languages☆164Updated 4 years ago
- 中文词典 / 中文詞典。Chinese / Chinese-English dictionaries.☆224Updated 3 weeks ago
- Open Language Profiles — English profile datasets from CEFR-J☆164Updated 5 years ago
- Convert a Chinese sentence to Pinyin or Jyutping☆64Updated 2 years ago
- Analyzes the given text and determine what's the vocabulary level based on CEFR levels☆49Updated 3 years ago
- Chinese lexicon containing definitions, character origins, and statistics, built for Dong Chinese (https://www.dong-chinese.com)☆56Updated 2 months ago
- Converts English text to IPA notation☆401Updated 2 years ago
- Cantonese Linguistics and NLP☆395Updated last year
- 粵文語料篩選器 Cantonese text filter☆41Updated 9 months ago
- Chinese language vocabulary graph generation. Python/Flask tool that performs dictionary search and analysis on Chinese Hanzi characters.…☆154Updated 2 years ago
- Free, open-source Chinese handwriting recognition in Javascript☆168Updated 6 years ago
- JavaScript Lemmatizer is a lemmatization library to retrieve a base form from an English inflected word.☆69Updated 4 years ago
- Spoken Cantonese from Hong Kong.☆30Updated 2 months ago
- 開放漢語字典 - 現代漢語字音數據庫☆24Updated 5 years ago
- British English pronunciation dictionary☆99Updated 8 years ago
- Chrome extension that translates Chinese words when hovering on them.☆40Updated 2 years ago
- Python library for CJK (Chinese, Japanese, and Korean) language dictionary☆94Updated last week
- A natural language detection library based on trigram statistical analysis for Node.js and the Web.☆213Updated 10 years ago
- Sentence Boundary Detection in javascript for node. http://tessmore.github.io/sbd/☆220Updated 2 years ago
- Offline bilingual dictionaries made using data from Wiktionary☆62Updated 10 years ago
- Gather modern English word frequencies from all enwiki articles.☆227Updated last year
- Machine-readable lists of lemma-token pairs in 23 languages.☆358Updated 3 years ago
- A curated list of resources dedicated to Natural Language Processing (NLP) of Cantonese | 粵語 NLP☆93Updated 4 years ago