erikavaris / tokenizerLinks
Tokenizer for Twitter and Reddit data
☆46Updated 6 years ago
Alternatives and similar repositories for tokenizer
Users that are interested in tokenizer are comparing it to the libraries listed below
Sorting:
- A Dependency Parser for Tweets☆78Updated 5 years ago
- Python port of the Twokenize class of ark-tweet-nlp☆142Updated 7 years ago
- ☆103Updated 6 years ago
- The Broad Twitter Corpus, an NER dataset in English stratified for time, location, social media genre, socioeconomic factors (COLING 2016…☆68Updated 3 years ago
- public repository of the interdisciplinary working group 'Hatespeech' of the research training group UCSM☆17Updated 6 years ago
- A framework to identify relations between ideas in temporal text corpora.☆28Updated 7 years ago
- A Large Automatically-Constructed Resource of Predicate Paraphrases☆45Updated 5 years ago
- Incremental learning of word embeddings with context informativeness.☆94Updated 2 years ago
- Quickly extract multi-word phrases from a corpus☆192Updated 5 years ago
- Doing things with embeddings☆66Updated 2 years ago
- Sparse Additive Generative Model of Text☆87Updated 8 years ago
- Computation of the semantic interpretability of topics produced by topic models.☆180Updated 8 years ago
- Repository for the CLiPS HAte speech DEtection System [HADES].☆24Updated 7 years ago
- Utility scripts in Python☆37Updated last month
- Code and data for inducing domain-specific sentiment lexicons.☆196Updated last year
- Mining Argument Structures with Expressive Inference (Linear and LSTM Engines)☆66Updated 8 years ago
- Code for the paper "Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings"☆69Updated 2 years ago
- ☆54Updated 3 years ago
- Multi-Annotator Competence Estimation tool☆63Updated 6 years ago
- Tokenization and pre-processing for Twitter data used to train classifiers.☆72Updated 8 years ago
- Tutorial on computational models of language change☆115Updated 6 years ago
- Harassment Lexicon and Corpus☆30Updated 7 years ago
- Training Temporal Word Embeddings with a Compass☆64Updated 2 years ago
- Sentence specificity prediction☆25Updated 6 years ago
- Dataset and code of our EMNLP 2019 paper "Multilingual and Multi-Aspect Hate Speech Analysis"☆57Updated 8 months ago
- A set of media framing annotations, along with scripts for obtaining the corresponding news articles☆52Updated 6 years ago
- Code for Mimicking Word Embeddings using Subword RNNs (EMNLP 2017)☆153Updated 5 years ago
- C++ implementation of Generalised Brown clustering and python scripts for feature generation☆41Updated 9 years ago
- Code for learning geographically-informed word embeddings☆22Updated 3 years ago
- Unsupervised method for extracting quotation-speaker pairs from large news corpora.☆29Updated 7 years ago