shenfei1010 / CyberCan
CyberCan is a lexicon of contemporary Cantonese based on more than 100 million pieces of internet texts from discussion forums in Hong Kong.
☆12Updated 3 years ago
Alternatives and similar repositories for CyberCan:
Users that are interested in CyberCan are comparing it to the libraries listed below
- Raw text of 申報☆23Updated 3 years ago
- Cantonese segmentation tool 粵語分詞工具☆29Updated 4 years ago
- A Package for Cantonese Tokenisation☆17Updated 3 years ago
- Twitter dataset for 2022 Russian and Ukrainian crisis☆49Updated 2 years ago
- Chinese Dialect Database☆17Updated 7 years ago
- Pre-trained ELECTRA from Hong Kong data☆27Updated 4 years ago
- BERT Tokenizer with vocabulary tailored for Cantonese☆20Updated 2 years ago
- A frequency lexicon for Hong Kong Cantonese☆21Updated 4 years ago
- ☆21Updated last year
- Tools to train and explore diachronic word embeddings from Big Historical Data☆20Updated 3 months ago
- An automation webcrawler based on Selenium library for retrieving parliamentary questions on The Website of Taiwan Legislative Yuan (http…☆11Updated last year
- BirdSpotter is a python package which provides an influence and bot detection toolkit for twitter.☆19Updated 3 years ago
- Loengfan (粵語兩分) is the Cantonese version of the Liang Fen input method☆12Updated 2 years ago
- Text mining for sociological studies☆10Updated 3 years ago
- This repository contains data of TikTok videos related to the 2024 U.S. Elections☆18Updated 2 months ago
- Chinese Moral Foundation Dictionary☆15Updated last year
- The official Github for the American Stories dataset as in {link}☆112Updated 10 months ago
- This package consists of functionalities for dynamic topic modelling and its visualization☆25Updated 4 years ago
- fastText vectors created from Hong Kong data.☆21Updated 4 years ago
- The Extended Moral Foundations Dictionary (E-MFD)☆36Updated 4 years ago
- https://sites.google.com/site/multidimensionaltagger☆31Updated last year
- Code to reproduce analysis done in the article Computational Grounded Theory: A Methodological Framework☆53Updated 7 years ago
- R Scraper for LIHKG, the Hong Kong version of Reddit.☆16Updated 4 years ago
- Hong Kong Cantonese Corpus of transcribed speech (spontaneous speech, radio programmes and a monologue).☆52Updated 10 months ago
- Digital Outrage Classifier from the Crockett Lab at Yale. Predicts whether tweets contain moral outrage.☆28Updated last year
- ☆54Updated last year
- Taiwanese Translation with BERT based model and RNN. Collection of Taiwanese text corpus☆11Updated 2 years ago
- Natural Language Processing for Political Science☆20Updated 7 years ago
- 💒 Reproducible Extraction of Cross-lingual Topics using R☆20Updated last year
- Additional material for the paper "MoralStrength: Exploiting a Moral Lexicon and Embedding Similarity for Moral Foundations Prediction"☆53Updated 2 years ago