ayaka14732/cantoseg

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ayaka14732/cantoseg)

ayaka14732 / cantoseg

Cantonese segmentation tool 粵語分詞工具

☆31

Alternatives and similar repositories for cantoseg

Users that are interested in cantoseg are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ayaka14732 / lihkg-scraper
View on GitHub
A Python script for scraping LIHKG
☆32Mar 7, 2022Updated 4 years ago
gwinterstein / Cifu
View on GitHub
A frequency lexicon for Hong Kong Cantonese
☆25Aug 27, 2020Updated 5 years ago
paramiai / cantoformer
View on GitHub
Transformers for Cantonese
☆58Oct 24, 2020Updated 5 years ago
CanCLID / awesome-cantonese-nlp
View on GitHub
A curated list of resources dedicated to Natural Language Processing (NLP) of Cantonese | 粵語 NLP
☆95Oct 17, 2021Updated 4 years ago
shenfei1010 / CyberCan
View on GitHub
CyberCan is a lexicon of contemporary Cantonese based on more than 100 million pieces of internet texts from discussion forums in Hong Ko…
☆12Aug 24, 2021Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
toastynews / electra-hongkongese
View on GitHub
Pre-trained ELECTRA from Hong Kong data
☆29Jul 7, 2020Updated 6 years ago
UniversalDependencies / UD_Cantonese-HK
View on GitHub
Spoken Cantonese from Hong Kong.
☆30May 6, 2026Updated 2 months ago
evelynkyl / yue_nmt
View on GitHub
Python scripts and datasets of the "Extremely Low-Resource Neural Machine Translation: A Case Study of Cantonese" project
☆16Oct 28, 2022Updated 3 years ago
wordshk / yue_references
View on GitHub
粵語/廣東話參考資料 Reference Materials for Yue / Cantonese
☆15Dec 12, 2025Updated 7 months ago
indiejoseph / hkcc-corpus
View on GitHub
《香港二十世紀中期粵語語料庫》打包器
☆16Apr 12, 2016Updated 10 years ago
wchan757 / Cantonese_Word_Segmentation
View on GitHub
Dictionary for Cantonese word segmentation
☆39Jun 4, 2024Updated 2 years ago
CanCLID / canto-filter
View on GitHub
粵文語料篩選器 Cantonese text filter
☆43Feb 4, 2026Updated 5 months ago
dohliam / ipa-lookup
View on GitHub
Search for pronuncations in different languages
☆11Nov 2, 2024Updated last year
CanCLID / jyutping.net
View on GitHub
粵語拼音輸入法下載網站 | Jyutping Input Method Website
☆15Mar 9, 2026Updated 4 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
meganndare / cantonese-nlp
View on GitHub
cantonese-mandarin unsupervised neural translation for sw project
☆29May 2, 2023Updated 3 years ago
ayaka14732 / bert-tokenizer-cantonese
View on GitHub
BERT Tokenizer with vocabulary tailored for Cantonese
☆23Oct 27, 2022Updated 3 years ago
jacksonllee / pycantonese
View on GitHub
Cantonese Linguistics and NLP
☆413May 26, 2026Updated 2 months ago
UserXiaohu / lda-model
View on GitHub
中文文本主题提取，并根据主题，对预测文本进行分类
☆12May 18, 2020Updated 6 years ago
esantus / EVALution
View on GitHub
Dataset containing Semantic Relations and Metadata, for Training and Evaluating Distributional Semantic Models in English and Mandarin Ch…
☆16Aug 7, 2017Updated 8 years ago
FudanNLP / NLPCC-WordSeg-Weibo
View on GitHub
☆15May 13, 2022Updated 4 years ago
CanCLID / rime-loengfan
View on GitHub
Loengfan (粵語兩分) is the Cantonese version of the Liang Fen input method
☆15Mar 3, 2022Updated 4 years ago
HLTCHKUST / cantonese-asr
View on GitHub
☆103Feb 1, 2024Updated 2 years ago
olgasilyutina / stm_internet_regulation
View on GitHub
Analysis of Russian mass media articles about internet regulation with structural topic modeling
☆11May 15, 2018Updated 8 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
rainfireliang / Weird-Statistics-Questions
View on GitHub
Answers to some "weird" statistics questions with R code
☆10Jun 8, 2025Updated last year
ichitenfont / suppchara
View on GitHub
常用香港外字表
☆56Sep 7, 2022Updated 3 years ago
ayaka14732 / TransCan
View on GitHub
An English-to-Cantonese machine translation model
☆55Mar 26, 2025Updated last year
kfcd / yyzd
View on GitHub
開放粵語字典 - 現代粵語字音數據庫
☆74Mar 30, 2023Updated 3 years ago
ray1007 / GWE
View on GitHub
☆31Jun 2, 2018Updated 8 years ago
mahoffman / social_network_analysis
View on GitHub
☆15Oct 9, 2021Updated 4 years ago
ivirtex / cupertino_lists
View on GitHub
Package that implements iOS-style grouped lists.
☆13Apr 22, 2022Updated 4 years ago
CanCLID / ToJyutping
View on GitHub
粵語拼音自動標註工具 Cantonese Pronunciation Automatic Labeling Tool
☆90Feb 17, 2026Updated 5 months ago
Lucien-qiang / Rhetoric-Generator
View on GitHub
Rhetorically Controlled Encoder-Decoder for Modern Chinese Poetry Generation
☆12Jun 4, 2019Updated 7 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ckiplab / ckip-transformers
View on GitHub
CKIP Transformers
☆768Apr 21, 2023Updated 3 years ago
ricky52be7 / vs-lihkg
View on GitHub
A VS Code extension for browsing LIHKG
☆16Jan 22, 2026Updated 6 months ago
yanshanjing / ChineseDiachronicCorpus
View on GitHub
ChineseDiachronicCorpus，中文历时语料库，横跨六十余年，包括腾讯历时新闻2000-2016，人民日报历时语料1946-2003，参考消息历时语料1957-2002。基于历时流通语料库，可用于历时语言变化计算、语言监测、社会文化变迁研究提供基础性的语料支…
☆25Jan 10, 2021Updated 5 years ago
hassyGo / pytorch-playground
View on GitHub
My PyTorch playground for NLP
☆13Sep 20, 2018Updated 7 years ago
wwbp / county_tweet_lexical_bank
View on GitHub
U.S. County level word and topic loading derived from a 10% Twitter sample from 2009-2015.
☆22Jun 2, 2021Updated 5 years ago
c0re100 / vidcutter
View on GitHub
Fork for Telegram usage
☆18Dec 10, 2024Updated last year
cedoard / snscrape_twitter
View on GitHub
Using snscrape and tweepy libraries to scrape unlimited amount of tweets
☆27Mar 1, 2021Updated 5 years ago