shenfei1010 / CyberCan
CyberCan is a lexicon of contemporary Cantonese based on more than 100 million pieces of internet texts from discussion forums in Hong Kong.
☆12Updated 3 years ago
Alternatives and similar repositories for CyberCan:
Users that are interested in CyberCan are comparing it to the libraries listed below
- A Package for Cantonese Tokenisation☆17Updated 3 years ago
- Cantonese segmentation tool 粵語分詞工具☆29Updated 4 years ago
- Raw text of 申報☆25Updated 3 years ago
- Chinese Moral Foundation Dictionary☆16Updated last year
- ☆21Updated last year
- BERT Tokenizer with vocabulary tailored for Cantonese☆20Updated 2 years ago
- A frequency lexicon for Hong Kong Cantonese☆21Updated 4 years ago
- Twitter dataset for 2022 Russian and Ukrainian crisis☆49Updated 2 years ago
- Hong Kong Cantonese Corpus of transcribed speech (spontaneous speech, radio programmes and a monologue).☆54Updated 11 months ago
- BirdSpotter is a python package which provides an influence and bot detection toolkit for twitter.☆19Updated 3 years ago
- Driver for LIWC2015 analysis. LIWC2015 dictionary not included.☆16Updated 2 years ago
- The spoken L1 corpus represents present-day spoken Chinese (Putonghua) used in mainland China, which is designed as a comparable corpus t…☆18Updated 3 years ago
- 粵文語料篩選器 Cantonese text filter☆38Updated 2 weeks ago
- Pre-trained ELECTRA from Hong Kong data☆28Updated 4 years ago
- Tools to train and explore diachronic word embeddings from Big Historical Data☆22Updated last month
- A simple toolkit for conducting analyses using corpus methods☆25Updated 3 years ago
- The official Github for the American Stories dataset as in {link}☆114Updated 11 months ago
- Loengfan (粵語兩分) is the Cantonese version of the Liang Fen input method☆12Updated 3 years ago
- fastText vectors created from Hong Kong data.☆21Updated 4 years ago
- Chinese Dialect Database☆17Updated 7 years ago
- 粵語拼音轉換表☆31Updated 10 months ago
- A Python implementation for Structural Topic Modeling☆41Updated 2 years ago
- This repository contains data of TikTok videos related to the 2024 U.S. Elections☆19Updated 2 weeks ago
- An automation webcrawler based on Selenium library for retrieving parliamentary questions on The Website of Taiwan Legislative Yuan (http…☆11Updated last year
- Tokenizer POS-tagger and Dependency-parser for Classical Chinese☆15Updated 3 months ago
- Repository for the paper Us vs. Them: A Dataset of Populist Attitudes, News Bias and Emotions☆16Updated 8 months ago
- ☆47Updated 2 years ago
- Contextualised Word Representations for Lexical Semantic Change Analysis☆31Updated 4 years ago
- Code and Data for paper: Cross-Partisan Discussions on YouTube: Conservatives Talk to Liberals but Liberals Don't Talk to Conservatives (…☆13Updated 3 years ago
- Github site with code and data associated with the ASR paper on the Geometry of Culture☆50Updated 4 years ago