shenfei1010 / CyberCanLinks
CyberCan is a lexicon of contemporary Cantonese based on more than 100 million pieces of internet texts from discussion forums in Hong Kong.
☆12Updated 3 years ago
Alternatives and similar repositories for CyberCan
Users that are interested in CyberCan are comparing it to the libraries listed below
Sorting:
- A Package for Cantonese Tokenisation☆18Updated 4 years ago
- Cantonese segmentation tool 粵語分詞工具☆30Updated 4 years ago
- Twitter dataset for 2022 Russian and Ukrainian crisis☆48Updated 2 years ago
- BERT Tokenizer with vocabulary tailored for Cantonese☆22Updated 2 years ago
- Chinese Moral Foundation Dictionary☆18Updated last year
- An English-to-Cantonese machine translation model☆52Updated 2 months ago
- Driver for LIWC2015 analysis. LIWC2015 dictionary not included.☆16Updated 2 years ago
- ☆54Updated 2 years ago
- Raw text of 申報☆26Updated 3 years ago
- fastText vectors created from Hong Kong data.☆21Updated 4 years ago
- An automation webcrawler based on Selenium library for retrieving parliamentary questions on The Website of Taiwan Legislative Yuan (http…☆11Updated 2 years ago
- Digital Outrage Classifier from the Crockett Lab at Yale. Predicts whether tweets contain moral outrage.☆30Updated 2 years ago
- Replication code for " Computational analysis of 140 years of US political speeches reveals more positive but increasingly polarized fram…☆22Updated last year
- ☆47Updated 3 years ago
- ☆21Updated 4 years ago
- BirdSpotter is a python package which provides an influence and bot detection toolkit for twitter.☆19Updated 4 years ago
- Additional material for the paper "MoralStrength: Exploiting a Moral Lexicon and Embedding Similarity for Moral Foundations Prediction"☆54Updated 2 years ago
- Github site with code and data associated with the ASR paper on the Geometry of Culture☆52Updated 5 years ago
- The Extended Moral Foundations Dictionary (E-MFD)☆40Updated 4 years ago
- Fine-tuned transformers for protest event detection.☆10Updated 4 years ago
- Pre-trained ELECTRA from Hong Kong data☆29Updated 4 years ago
- Chinese Dialect Database☆17Updated 8 years ago
- Hong Kong Cantonese Corpus of transcribed speech (spontaneous speech, radio programmes and a monologue).☆64Updated last year
- 粵文語料篩選器 Cantonese text filter☆40Updated 2 months ago
- R Scraper for LIHKG, the Hong Kong version of Reddit.☆16Updated 4 years ago
- The spoken L1 corpus represents present-day spoken Chinese (Putonghua) used in mainland China, which is designed as a comparable corpus t…☆19Updated 3 years ago
- Project repository of the paper "Less Annotating, More Classifying – Addressing the Data Scarcity Issue of Supervised Machine Learning wi…☆32Updated last year
- 💒 Reproducible Extraction of Cross-lingual Topics using R☆20Updated last year
- This repository contains data of TikTok videos related to the 2024 U.S. Elections☆25Updated 4 months ago
- A Cantonese-English translator based on prompt engineering☆12Updated last year