shenfei1010 / CyberCan
CyberCan is a lexicon of contemporary Cantonese based on more than 100 million pieces of internet texts from discussion forums in Hong Kong.
☆12Updated 3 years ago
Alternatives and similar repositories for CyberCan:
Users that are interested in CyberCan are comparing it to the libraries listed below
- Cantonese segmentation tool 粵語分詞工具☆30Updated 4 years ago
- A Package for Cantonese Tokenisation☆17Updated 3 years ago
- ☆21Updated last year
- Chinese Dialect Database☆17Updated 7 years ago
- A frequency lexicon for Hong Kong Cantonese☆21Updated 4 years ago
- Twitter dataset for 2022 Russian and Ukrainian crisis☆49Updated 2 years ago
- Chinese Moral Foundation Dictionary☆17Updated last year
- 粵文語料篩選器 Cantonese text filter☆38Updated last week
- Pre-trained ELECTRA from Hong Kong data☆28Updated 4 years ago
- Github site with code and data associated with the ASR paper on the Geometry of Culture☆51Updated 4 years ago
- Raw text of 申報☆25Updated 3 years ago
- fastText vectors created from Hong Kong data.☆21Updated 4 years ago
- 💒 Reproducible Extraction of Cross-lingual Topics using R☆20Updated last year
- Loengfan (粵語兩分) is the Cantonese version of the Liang Fen input method☆12Updated 3 years ago
- The Cantonese Wordnet☆14Updated last year
- BERT Tokenizer with vocabulary tailored for Cantonese☆20Updated 2 years ago
- An automation webcrawler based on Selenium library for retrieving parliamentary questions on The Website of Taiwan Legislative Yuan (http…☆11Updated last year
- ☆47Updated 2 years ago
- ☆54Updated 2 years ago
- Code for measuring novelty in science using publication text☆26Updated last month
- Text-Based Ideal Points☆44Updated 2 years ago
- The Extended Moral Foundations Dictionary (E-MFD)☆40Updated 4 years ago
- U.S. County level word and topic loading derived from a 10% Twitter sample from 2009-2015.☆21Updated 3 years ago
- A Python implementation for Structural Topic Modeling☆41Updated 2 years ago
- R Scraper for LIHKG, the Hong Kong version of Reddit.☆16Updated 4 years ago
- English Small World of Words SWOWEN-2018☆66Updated 2 years ago
- rime-cantonese 上游詞表倉庫☆27Updated 7 months ago
- Learning structural topic modeling using the stm R package.☆127Updated 7 years ago
- Code to reproduce analysis done in the article Computational Grounded Theory: A Methodological Framework☆56Updated 7 years ago
- An English-to-Cantonese machine translation model☆49Updated 2 weeks ago