ayaka14732/bert-tokenizer-cantonese

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ayaka14732/bert-tokenizer-cantonese)

ayaka14732 / bert-tokenizer-cantonese

BERT Tokenizer with vocabulary tailored for Cantonese

☆23

Alternatives and similar repositories for bert-tokenizer-cantonese

Users that are interested in bert-tokenizer-cantonese are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

CanCLID / canto-filter
View on GitHub
粵文語料篩選器 Cantonese text filter
☆43Feb 4, 2026Updated 5 months ago
ayaka14732 / TransCan
View on GitHub
An English-to-Cantonese machine translation model
☆55Mar 26, 2025Updated last year
paramiai / cantoformer
View on GitHub
Transformers for Cantonese
☆58Oct 24, 2020Updated 5 years ago
ayaka14732 / bart-base-jax
View on GitHub
JAX implementation of the bart-base model
☆34Apr 11, 2023Updated 3 years ago
CanCLID / rime-cantonese-upstream
View on GitHub
rime-cantonese 上游詞表倉庫
☆33Jun 29, 2026Updated 3 weeks ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
CanCLID / ToJyutping
View on GitHub
粵語拼音自動標註工具 Cantonese Pronunciation Automatic Labeling Tool
☆90Feb 17, 2026Updated 5 months ago
AlienKevin / wordshk-tools
View on GitHub
Tools for processing open Cantonese dictionary data provided words.hk
☆25Jan 30, 2025Updated last year
Vocab-Apps / python-pinyin-jyutping-sentence
View on GitHub
Convert a Chinese sentence to Pinyin or Jyutping
☆65Feb 26, 2023Updated 3 years ago
CanCLID / rime-loengfan
View on GitHub
Loengfan (粵語兩分) is the Cantonese version of the Liang Fen input method
☆15Mar 3, 2022Updated 4 years ago
shenfei1010 / CyberCan
View on GitHub
CyberCan is a lexicon of contemporary Cantonese based on more than 100 million pieces of internet texts from discussion forums in Hong Ko…
☆12Aug 24, 2021Updated 4 years ago
lshk-org / jyutping-table
View on GitHub
電腦用漢字粵語拼音表 / Cantonese Pronunciation List of the Characters for Computers
☆66Jan 11, 2024Updated 2 years ago
nk2028 / commonly-used-chinese-characters-and-words
View on GitHub
漢語常用字詞表
☆16Jun 3, 2023Updated 3 years ago
chutaklee / CantoASR
View on GitHub
Fine-tuning Wav2Vec2.0 on Common Voice(zh-HK)
☆16May 8, 2022Updated 4 years ago
ayaka14732 / cantoseg
View on GitHub
Cantonese segmentation tool 粵語分詞工具
☆31Aug 22, 2020Updated 5 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
danielvarab / uniparse
View on GitHub
A framework for graph-based dependency parsing.
☆19Feb 9, 2022Updated 4 years ago
Papnas / shupin
View on GitHub
☆23Apr 21, 2022Updated 4 years ago
lmaxwell / Armednn
View on GitHub
cross-platform modular neural network inference library, small and efficient
☆13May 15, 2023Updated 3 years ago
CanCLID / rime-cantonese-schemes
View on GitHub
中州韻粵語拼音輸入法分歧拼音系統補丁 | For users of alternative Cantonese romanisation schemes
☆27Sep 29, 2025Updated 9 months ago
lennylxx / google-input-tools-macos
View on GitHub
Google Input Tools for macOS
☆34Mar 29, 2026Updated 3 months ago
past / spellcheck
View on GitHub
A spell-checker written in Rust
☆23Jan 14, 2022Updated 4 years ago
awong-dev / cantodict-archive
View on GitHub
Archive of the abandoned Cantonese Dictionary cantodict from https://www.cantonese.sheik.co.uk/dictionary/.
☆24Jul 20, 2022Updated 4 years ago
fosskers / sly-overlay
View on GitHub
Overlay Common Lisp evaluation results.
☆10Aug 16, 2025Updated 11 months ago
tachoknight / swift-lang-packaging-fedora
View on GitHub
All the files necessary to package Apple's Swift Programming language for Fedora
☆13Jan 28, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
toastynews / electra-hongkongese
View on GitHub
Pre-trained ELECTRA from Hong Kong data
☆29Jul 7, 2020Updated 6 years ago
voidful / vall-e-encodec
View on GitHub
☆41May 15, 2023Updated 3 years ago
mattparks / Font
View on GitHub
Vulkan TTF font rendering using bezier curves
☆12Feb 24, 2019Updated 7 years ago
oeb25 / smtlib-rs
View on GitHub
A high-level API for interacting with SMT solvers.
☆35Dec 8, 2025Updated 7 months ago
nixberg / blake3-swift
View on GitHub
☆13Nov 20, 2023Updated 2 years ago
MartinEesmaa / FFmpeg-Estonia
View on GitHub
Mirror of https://git.ffmpeg.org/ffmpeg.git
☆10Jul 11, 2026Updated 2 weeks ago
tetutaro / 0xprogen
View on GitHub
Japanese Font for programming (0xProto + HackGen)
☆10Apr 14, 2024Updated 2 years ago
MatthewRock / cl-trie
View on GitHub
Common Lisp implementation of Trie data structure.
☆13Jan 10, 2023Updated 3 years ago
meganndare / cantonese-nlp
View on GitHub
cantonese-mandarin unsupervised neural translation for sw project
☆29May 2, 2023Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
dmjio / c-ffi-example
View on GitHub
Example usage of the Haskell C FFI with hsc2hs
☆12Dec 14, 2024Updated last year
jamesohortle / UnicodeHover
View on GitHub
Hover over a Unicode escape in VS Code to see the glyph of the character, its description and a link to its webpage!
☆12Dec 30, 2022Updated 3 years ago
michael105 / shrinkelf
View on GitHub
Strip 64bit elf binaries aggressively
☆13Aug 5, 2021Updated 4 years ago
erickguan / pinyin-syllable-segmentation
View on GitHub
A implementation of pinyin syllable segmentation (刘政怡, 吴建国 and 刘慧婷, 2008. 音节切分歧义方法研究. 计算机技术与发展, 18(8), pp.35-38.)
☆13Apr 8, 2019Updated 7 years ago
roehling / git-archive-all
View on GitHub
git-archive with recursive submodule support
☆16Updated this week
srijan-paul / frametap
View on GitHub
Cross platform screen capture library
☆13Feb 28, 2025Updated last year
Steve-Yuu / Yuu-Gothic
View on GitHub
主打完美支持舊字形顯示，不僅支持基本庫和擴展A全部字符，另外還支持越南喃字、香港增補字符集顯示，基於開源字體思源黑體進行創作，基於SIL Open Font License修改
☆12May 29, 2024Updated 2 years ago