erikavaris / tokenizer
Tokenizer for Twitter and Reddit data
☆45Updated 5 years ago
Alternatives and similar repositories for tokenizer:
Users that are interested in tokenizer are comparing it to the libraries listed below
- A Dependency Parser for Tweets☆78Updated 5 years ago
- Multi-Annotator Competence Estimation tool☆63Updated 5 years ago
- C++ implementation of Generalised Brown clustering and python scripts for feature generation☆41Updated 8 years ago
- The Yahoo News Annotated Comments Corpus (YNACC)☆18Updated 6 years ago
- public repository of the interdisciplinary working group 'Hatespeech' of the research training group UCSM☆17Updated 6 years ago
- A Large Automatically-Constructed Resource of Predicate Paraphrases☆45Updated 4 years ago
- Python port of the Twokenize class of ark-tweet-nlp☆141Updated 6 years ago
- Unsupervised method for extracting quotation-speaker pairs from large news corpora.☆29Updated 6 years ago
- Corpus and annotations for the CL-Aff Shared Task from the University of Pennsylvania☆19Updated 3 years ago
- A framework to identify relations between ideas in temporal text corpora.☆28Updated 6 years ago
- Sentence specificity prediction☆25Updated 6 years ago
- Predict edit intentions on Wikipedia☆19Updated 6 years ago
- A natural language processing tool for automatically detecting quotations in text.☆15Updated 3 years ago
- Regex like pattern tree matching but on sentence's tree instead of Strings☆42Updated 7 years ago
- ☆104Updated 6 years ago
- Code and data for ACL2016 article "Which argument is more convincing? Analyzing and predicting convincingness of Web arguments using bidi…☆28Updated 8 years ago
- Code to reproduce experiments from the EMNLP 2015 paper about Rumour Stance Classification with Gaussian Processes.☆36Updated 8 years ago
- The Broad Twitter Corpus, an NER dataset in English stratified for time, location, social media genre, socioeconomic factors (COLING 2016…☆66Updated 2 years ago
- Training Temporal Word Embeddings with a Compass☆64Updated 2 years ago
- data and scripts for the shared task "Task 1: Paraphrase and Semantic Similarity in Twitter (PIT)" at SemEval 2015☆43Updated 4 years ago
- Automatic labeling for topic model☆57Updated 9 years ago
- ☆56Updated 6 years ago
- Mining Argument Structures with Expressive Inference (Linear and LSTM Engines)☆65Updated 7 years ago
- ☆54Updated 9 years ago
- Twpipe is a pipeline toolkit that parses raw tweets into universal dependencies.☆28Updated 5 years ago
- Incremental learning of word embeddings with context informativeness.☆94Updated last year
- Sparse Additive Generative Model of Text☆87Updated 8 years ago
- Counter-fitting Word Vectors to Linguistic Constraints☆144Updated 4 years ago
- An Easy to Use, Accurate Python Geolocation Library☆41Updated 2 years ago
- A collection of English tweets annotated in Universal Dependencies.☆39Updated 3 years ago