tsroten / hanzidentifier
Python module that identifies Chinese text as being Simplified or Traditional
☆91Updated 5 months ago
Alternatives and similar repositories for hanzidentifier:
Users that are interested in hanzidentifier are comparing it to the libraries listed below
- Python library for CJK (Chinese, Japanese, and Korean) language dictionary☆90Updated this week
- Hanzi Converter for Traditional and Simplified Chinese☆187Updated 5 years ago
- A CWN Python binding with graph structure☆31Updated last year
- 臺灣閩南語常用詞辭典 資料檔☆78Updated 2 years ago
- Constants used in Chinese text processing☆370Updated 4 months ago
- Identification and conversion functions for Chinese text processing☆59Updated 5 months ago
- 台語、族語、客語的語料清單、彙整☆41Updated 5 years ago
- OpenCC made with Python☆553Updated last year
- A sentence segmentation library with wide language support optimized for speed and utility.☆61Updated 8 months ago
- Export UNIHAN's database to csv, json or yaml☆57Updated this week
- 中華大辭典☆120Updated last year
- ☆169Updated last month
- 臺灣言語服務☆43Updated 5 years ago
- ☆36Updated 11 months ago
- A curated list of resources dedicated to Natural Language Processing (NLP) of Cantonese | 粵語 NLP☆89Updated 3 years ago
- 教育部重編國語辭典 資料檔; 若有建議或 bug 請在 moedict-process 反應☆141Updated 2 years ago
- Han character library for CJKV languages☆157Updated 4 years ago
- 漢語拼音轉換表☆39Updated 4 years ago
- Simple conversion and localization between simplified and traditional Chinese using tables from MediaWiki.☆539Updated last year
- Converts between traditional and simplified Chinese☆30Updated 7 months ago
- Phraseg - 一言:新詞發現工具包☆26Updated 3 years ago
- fastText vectors created from Hong Kong data.☆21Updated 4 years ago
- A tool for ancient Chinese segmentation.☆53Updated 6 years ago
- Open Source State-of-the-art Chinese Word Segmentation System with BiLSTM and ELMo. https://arxiv.org/abs/1901.05816☆42Updated 3 years ago
- A Python package for learning, evaluating, annotating, and extracting vector representations of construction grammars☆37Updated 6 months ago
- ☆93Updated 5 months ago
- Library for extracting text and timestamps from multiple subtitle files (.ass, .ssa, .srt, .sub, .txt).☆52Updated last year
- ☆28Updated 2 months ago
- 粵文語料篩選器 Cantonese text filter☆40Updated last month
- Tokenizer POS-tagger and Dependency-parser for Classical Chinese☆65Updated 5 months ago