Tokenizer for Twitter and Reddit data
☆45Apr 14, 2019Updated 6 years ago
Alternatives and similar repositories for tokenizer
Users that are interested in tokenizer are comparing it to the libraries listed below
Sorting:
- The Tweets2013 Internet Archive collection☆10Aug 7, 2020Updated 5 years ago
- Twitter conversation collection script, which collects all replies to a given tweet☆68Jan 21, 2016Updated 10 years ago
- How (but not why) to do Twitter sociolinguistic analysis in the Unix Shell☆10Apr 19, 2016Updated 9 years ago
- Python standalone tokenizer☆15Nov 12, 2015Updated 10 years ago
- Tweets annotated with coarse-grained sense labels (supersenses)☆13Jun 13, 2014Updated 11 years ago
- Python tools for data analysis☆19May 21, 2019Updated 6 years ago
- ☆16Apr 9, 2019Updated 6 years ago
- Software for the paper "Gender and Lexical Variation in Social Media" with David Bamman and Tyler Schnoebelen☆17Nov 10, 2015Updated 10 years ago
- A project for clustering text streams using locality-sensitive hashing (LSH) in Python☆26Sep 23, 2011Updated 14 years ago
- The Broad Twitter Corpus, an NER dataset in English stratified for time, location, social media genre, socioeconomic factors (COLING 2016…☆68May 12, 2022Updated 3 years ago
- Multi-Turn-Single-Intent Bert model for dialogue session classification☆25Dec 8, 2022Updated 3 years ago
- An implementation of label propagation from the paper "Learning from labeled and unlabeled data with label propagation"☆20Mar 19, 2016Updated 9 years ago
- PyTorch implementation of L2R2 in SIGIR 2020☆17Jun 12, 2023Updated 2 years ago
- 🔎 Alfred workflow to search arxiv.org items☆25Aug 30, 2023Updated 2 years ago
- ACL'2020: Contextualized Sparse Representations for Real-Time Open-Domain Question Answering☆49Oct 6, 2020Updated 5 years ago
- C++ library for modeling with Pitman-Yor processes☆34Nov 28, 2017Updated 8 years ago
- Twitter topic search and indexing with Elasticsearch☆21Feb 28, 2017Updated 9 years ago
- ☆23Apr 26, 2018Updated 7 years ago
- Repository for our ICCV 2017 paper: A Read Write Network for Movie Story Understanding☆85Apr 13, 2018Updated 7 years ago
- A comprehensive graph of mathematical domains and topics☆23Jan 8, 2022Updated 4 years ago
- USFD submission code for Semeval 2016 Task 6, Subtask B☆26Feb 24, 2016Updated 10 years ago
- This is the text partitioner project for Python.☆21Dec 11, 2018Updated 7 years ago
- ☆25Sep 10, 2019Updated 6 years ago
- Repository for our ICLR 2019 paper: Discovery of Natural Language Concepts in Individual Units of CNNs☆26Mar 9, 2019Updated 6 years ago
- Alembic extension that adds support for arbitrary user-defined objects like views or functions in autogenerate command.☆12Feb 6, 2025Updated last year
- ☆27Mar 27, 2016Updated 9 years ago
- Python port of the Twokenize class of ark-tweet-nlp☆142May 4, 2018Updated 7 years ago
- Tokenization and pre-processing for Twitter data used to train classifiers.☆72Sep 28, 2016Updated 9 years ago
- Stance Detection with Conditional Encoding☆71Jan 14, 2017Updated 9 years ago
- ☆46Oct 28, 2024Updated last year
- framework for doing NER and other types of entity recognition, in Python☆68Jun 21, 2022Updated 3 years ago
- Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenizati…☆676Jun 2, 2025Updated 9 months ago
- Code and data for inducing domain-specific sentiment lexicons.☆196Aug 2, 2024Updated last year
- Distributed persistent Task Queue running on Dask☆38Apr 23, 2023Updated 2 years ago
- ☆10Jun 24, 2020Updated 5 years ago
- Open-sourced implementation of this paper: https://goo.gl/EQLgnT☆31Jul 22, 2016Updated 9 years ago
- European Parliament website Python scraper☆12Oct 19, 2016Updated 9 years ago
- Evaluate your word embeddings☆35Dec 3, 2019Updated 6 years ago
- 📉 A collection of TensorBoard-related utilities (In Progress)☆37Nov 17, 2022Updated 3 years ago