zachary822 / chinese-converter
Converts between traditional and simplified Chinese
☆30Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for chinese-converter
- Python module that identifies Chinese text as being Simplified or Traditional☆86Updated this week
- Tokenizer POS-Tagger and Dependency-parser with BERT/RoBERTa/DeBERTa models for Japanese and other languages☆48Updated last month
- repo for Tibetan corpora☆21Updated last year
- OpusFilter - Parallel corpus processing toolkit☆102Updated 3 months ago
- 🦜 NLP for Tibetan, in Python.☆32Updated last year
- Estimate the phonetic distance between Chinese words and get similar sounding candidate words.☆35Updated last year
- Improved Sentence Alignment in Linear Time and Space☆163Updated last year
- 利用文本分析算法和Python脚本,自动纠正word中的英语单词拼写错误☆46Updated 6 years ago
- An open-access corpus of conversational bilingual speech in Cantonese and English☆40Updated 2 years ago
- ☆28Updated last week
- A accurate multilingual word aligner based on LaBSE☆20Updated last year
- 人民日报1998年1-4月中文标注语料库☆29Updated 6 years ago
- Tokenizer POS-tagger and Dependency-parser for Classical Chinese☆62Updated 3 weeks ago
- Library for extracting text and timestamps from multiple subtitle files (.ass, .ssa, .srt, .sub, .txt).☆55Updated 8 months ago
- Tool to fix bitexts and tag near-duplicates for removal☆29Updated 3 months ago
- 渊 - A project for Classical Chinese☆94Updated 2 years ago
- Pre-trained ELECTRA from Hong Kong data☆27Updated 4 years ago
- Identification and conversion functions for Chinese text processing☆58Updated this week
- 古文现代文翻译平行语料库☆96Updated 2 years ago
- Multilingual sentence alignment using sentence embeddings☆101Updated 2 weeks ago
- convert epub file to txt☆83Updated 4 years ago
- pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation☆53Updated 2 months ago
- An English lexical database from the Big 🍎, let's go Mets baby love da Mets☆15Updated 2 weeks ago
- Efficient Low-Memory Aligner☆139Updated 2 months ago
- Hong Kong Cantonese Corpus of transcribed speech (spontaneous speech, radio programmes and a monologue).☆46Updated 8 months ago
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)☆67Updated 6 months ago
- High performance Trie and Ahocorasick automata (AC automata) Keyword Match & Replace Tool for python. Correct case insensitive implementa…☆94Updated last month
- A curated list of resources dedicated to Natural Language Processing (NLP) of Cantonese | 粵語 NLP☆85Updated 3 years ago
- 🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python☆58Updated 2 months ago