BERT Tokenizer with vocabulary tailored for Cantonese
☆23Oct 27, 2022Updated 3 years ago
Alternatives and similar repositories for bert-tokenizer-cantonese
Users that are interested in bert-tokenizer-cantonese are comparing it to the libraries listed below
Sorting:
- An English-to-Cantonese machine translation model☆55Mar 26, 2025Updated 11 months ago
- 粵文語料篩選器 Cantonese text filter☆41Feb 4, 2026Updated last month
- A Python script for scraping LIHKG☆32Mar 7, 2022Updated 3 years ago
- rime-cantonese 上游詞表倉庫☆32Dec 24, 2025Updated 2 months ago
- CyberCan is a lexicon of contemporary Cantonese based on more than 100 million pieces of internet texts from discussion forums in Hong Ko…☆12Aug 24, 2021Updated 4 years ago
- Transformers for Cantonese☆57Oct 24, 2020Updated 5 years ago
- cross-platform modular neural network inference library, small and efficient☆13May 15, 2023Updated 2 years ago
- Fine-tuning Wav2Vec2.0 on Common Voice(zh-HK)☆16May 8, 2022Updated 3 years ago
- ☆13Apr 24, 2024Updated last year
- Official Repository of UltraVoice☆58Oct 28, 2025Updated 4 months ago
- 漢語常用字詞表☆16Jun 3, 2023Updated 2 years ago
- An audio and transcribed corpus of contemporary Hong Kong Cantonese☆40Dec 30, 2020Updated 5 years ago
- 粵語拼音自動標註工具 Cantonese Pronunciation Automatic Labeling Tool☆81Feb 17, 2026Updated 2 weeks ago
- A framework for graph-based dependency parsing.☆18Feb 9, 2022Updated 4 years ago
- Tools for processing open Cantonese dictionary data provided words.hk☆23Jan 30, 2025Updated last year
- ☆22Apr 21, 2022Updated 3 years ago
- A spell-checker written in Rust☆23Jan 14, 2022Updated 4 years ago
- 中州韻粵語拼音輸入法分歧拼音系統補丁 | For users of alternative Cantonese romanisation schemes☆25Sep 29, 2025Updated 5 months ago
- Designs, infrastructure, and experiments around Race Logic☆25Jun 25, 2020Updated 5 years ago
- 開放粵語字典 - 現代粵語字音數據庫☆65Mar 30, 2023Updated 2 years ago
- Cantonese Input Method for macOS☆31Jan 25, 2025Updated last year
- cantonese-mandarin unsupervised neural translation for sw project☆28May 2, 2023Updated 2 years ago
- Dataiku DSS plugin to detect languages, correct misspellings, and clean text data 🧼☆22Jan 29, 2026Updated last month
- 電腦用漢字粵語拼音表 / Cantonese Pronunciation List of the Characters for Computers☆62Jan 11, 2024Updated 2 years ago
- LogicCircuit is a program that helps build/simulate simple circuits using logic gates. It is meant to teach people the basics of how logi…☆10Feb 16, 2026Updated 2 weeks ago
- Google Input Tools for macOS☆32Feb 3, 2022Updated 4 years ago
- Cantonese segmentation tool 粵語分詞工具☆30Aug 22, 2020Updated 5 years ago
- SurpriseNet: Melody Harmonization Conditioning on User-controlled Surprise Contours☆28May 23, 2025Updated 9 months ago
- The package 'data-driven density estimation x' (dddex) turns any standard point forecasting model into an estimator of the underlying con…☆10Dec 1, 2025Updated 3 months ago
- Visualizing electric fields in Elm☆33Jul 20, 2022Updated 3 years ago
- ☆10Jan 20, 2023Updated 3 years ago
- A high-level API for interacting with SMT solvers.☆33Dec 8, 2025Updated 2 months ago
- ☆41May 15, 2023Updated 2 years ago
- Source code and demo for INTERPSEECH 2023 paper: DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion P…☆37Dec 5, 2023Updated 2 years ago
- Python crossplatform library for Mac/linux and widows os.Complete system command, send alert, notifications, set brightness, recording au…☆11Apr 25, 2025Updated 10 months ago
- A minimal and interpretable Brian2 based DYNAP neuromorphic processor simulator for educational purposes.☆12Jun 23, 2022Updated 3 years ago
- Empirical-Research Toolkit☆11Nov 21, 2025Updated 3 months ago
- ☆13Jul 17, 2021Updated 4 years ago
- Pack cuda environment for bytesep music separation and provide a simple gui.☆34Apr 16, 2022Updated 3 years ago