shenfei1010 / CyberCan
CyberCan is a lexicon of contemporary Cantonese based on more than 100 million pieces of internet texts from discussion forums in Hong Kong.
☆12Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for CyberCan
- A Package for Cantonese Tokenisation☆17Updated 3 years ago
- Chinese Dialect Database☆16Updated 7 years ago
- A frequency lexicon for Hong Kong Cantonese☆20Updated 4 years ago
- Twitter dataset for 2022 Russian and Ukrainian crisis☆50Updated 2 years ago
- Raw text of 申報☆18Updated 2 years ago
- ☆19Updated 11 months ago
- Cantonese segmentation tool 粵語分詞工具☆29Updated 4 years ago
- Pre-trained ELECTRA from Hong Kong data☆27Updated 4 years ago
- Code to reproduce analysis done in the article Computational Grounded Theory: A Methodological Framework☆52Updated 6 years ago
- 《香港二十世紀中期粵語語料庫》打包器☆16Updated 8 years ago
- 漢語常用字詞表☆10Updated last year
- A curated list of digital things related to the field of Chinese studies.☆30Updated 4 years ago
- BirdSpotter is a python package which provides an influence and bot detection toolkit for twitter.☆19Updated 3 years ago
- Loengfan (粵語兩分) is the Cantonese version of the Liang Fen input method☆11Updated 2 years ago
- Text-Based Ideal Points☆44Updated last year
- 💒 Reproducible Extraction of Cross-lingual Topics using R☆20Updated last year
- The official Github for the American Stories dataset as in {link}☆108Updated 8 months ago
- Github site with code and data associated with the ASR paper on the Geometry of Culture☆47Updated 4 years ago
- An automation webcrawler based on Selenium library for retrieving parliamentary questions on The Website of Taiwan Legislative Yuan (http…☆8Updated last year
- A simple toolkit for conducting analyses using corpus methods☆24Updated 2 years ago
- Literature 📄 and datasets 📚 on automatic populism detection☆15Updated 8 months ago
- BERT Tokenizer with vocabulary tailored for Cantonese☆19Updated 2 years ago
- 開放漢語字典 - 現代漢語字音數據庫☆21Updated 4 years ago
- Repository for the paper Us vs. Them: A Dataset of Populist Attitudes, News Bias and Emotions☆16Updated 5 months ago
- Natural Language Processing for Political Science☆21Updated 7 years ago
- 粵 文語料篩選器 Cantonese text filter☆33Updated 2 months ago
- This repository contains data of TikTok videos related to the 2024 U.S. Elections☆15Updated this week
- A Python library to add reconstructed pronunciations of Middle Chinese on Chinese texts☆9Updated last year
- The spoken L1 corpus represents present-day spoken Chinese (Putonghua) used in mainland China, which is designed as a comparable corpus t…☆17Updated 3 years ago