tsroten / hanzidentifier
Python module that identifies Chinese text as being Simplified or Traditional
☆91Updated 4 months ago
Alternatives and similar repositories for hanzidentifier:
Users that are interested in hanzidentifier are comparing it to the libraries listed below
- Python library for CJK (Chinese, Japanese, and Korean) language dictionary☆89Updated this week
- Hanzi Converter for Traditional and Simplified Chinese☆185Updated 5 years ago
- A CWN Python binding with graph structure☆29Updated last year
- Constants used in Chinese text processing☆370Updated 4 months ago
- A sentence segmentation library with wide language support optimized for speed and utility.☆61Updated 7 months ago
- 臺灣閩南語常用詞辭典 資料檔☆78Updated last year
- 台語、族語、客語的語料清單、彙整☆41Updated 5 years ago
- Phraseg - 一言:新詞發現工具包☆26Updated 3 years ago
- Cython wrapper on Hunspell Dictionary☆66Updated 9 months ago
- ☆93Updated 5 months ago
- fastText vectors created from Hong Kong data.☆21Updated 4 years ago
- A curated list of resources dedicated to Natural Language Processing (NLP) of Cantonese | 粵語 NLP☆88Updated 3 years ago
- 中華大辭典☆119Updated last year
- 🤖📇 handling multiple nlp task in one pipeline☆56Updated last year
- ☆169Updated 2 weeks ago
- 教育部重編國語辭典 資料檔; 若有建議或 bug 請在 moedict-process 反應☆141Updated 2 years ago
- 🍳 NLPrep - dataset tool for many natural language processing task☆28Updated 3 years ago
- Spoken Cantonese from Hong Kong.☆29Updated 5 months ago
- Library for extracting text and timestamps from multiple subtitle files (.ass, .ssa, .srt, .sub, .txt).☆52Updated last year
- 漢語拼音轉換表☆39Updated 4 years ago
- Export UNIHAN's database to csv, json or yaml☆55Updated this week
- 臺灣言語工具☆117Updated 8 months ago
- 臺灣言語服務☆43Updated 5 years ago
- Han character library for CJKV languages☆156Updated 4 years ago
- A tool for ancient Chinese segmentation.☆53Updated 5 years ago
- ☆28Updated 2 months ago
- A toolbox for working with the Chinese language in Python☆150Updated 5 years ago
- OpenCC made with Python☆551Updated last year
- Open Source State-of-the-art Chinese Word Segmentation System with BiLSTM and ELMo. https://arxiv.org/abs/1901.05816☆42Updated 3 years ago
- Tokenizer POS-tagger and Dependency-parser for Classical Chinese☆13Updated last year