tsroten / hanzidentifier
Python module that identifies Chinese text as being Simplified or Traditional
☆83Updated last year
Related projects: ⓘ
- Hanzi Converter for Traditional and Simplified Chinese☆180Updated 4 years ago
- Python library for CJK (Chinese, Japanese, and Korean) language dictionary☆81Updated this week
- Constants used in Chinese text processing☆355Updated last year
- Identification and conversion functions for Chinese text processing☆56Updated 3 months ago
- 臺灣閩南語常用詞辭典 資料檔☆74Updated last year
- Library for extracting text and timestamps from multiple subtitle files (.ass, .ssa, .srt, .sub, .txt).☆54Updated 5 months ago
- 台語、族語、客語的語料清單、彙整☆37Updated 4 years ago
- ☆19Updated 3 months ago
- Multilingual sentence alignment using sentence embeddings☆92Updated 9 months ago
- ☆28Updated 3 months ago
- A CWN Python binding with graph structure☆25Updated last year
- unihandecode is a transliteration library to convert all characters/words in Unicode into ASCII alphabet that aware with Language prefere…☆69Updated 2 years ago
- 粵文語料篩選器 Cantonese text filter☆33Updated 2 weeks ago
- OpenCC binding for Python.☆52Updated 4 years ago
- A toolbox for working with the Chinese language in Python☆146Updated 4 years ago
- An English-to-Cantonese machine translation model☆48Updated 5 months ago
- convert epub file to txt☆80Updated 4 years ago
- A simple python script to translate chinese to pinyin based on Mandarin.dat☆207Updated 6 months ago
- Export UNIHAN's database to csv, json or yaml☆52Updated this week
- fastText vectors created from Hong Kong data.☆21Updated 4 years ago
- 漢語拼音轉換表☆34Updated 3 years ago
- A curated list of resources dedicated to Natural Language Processing (NLP) of Cantonese | 粵語 NLP☆84Updated 2 years ago
- Open Source State-of-the-art Chinese Word Segmentation System with BiLSTM and ELMo. https://arxiv.org/abs/1901.05816☆39Updated 3 years ago
- Spoken Cantonese from Hong Kong.☆28Updated 4 months ago
- Sentence aligner☆106Updated 3 years ago
- Chinese stopwords collection☆128Updated 4 years ago
- fastlangid, the only language identification package that support cantonese (zh-yue), simplified (zh-hans) and traditional chinese (zh-ha…☆38Updated last year
- A python module to reduce Unicode to a 'good enough' ASCII representation (outdated Github copy)☆36Updated 13 years ago
- an open solution for collecting n-gram Chinese lexicon and n-gram statistics☆73Updated 8 years ago
- OpenCC made with Python☆532Updated 9 months ago