delvinso / covid19_unique_tweets
An on-going dataset consisting of hashtags, n-gram counts and other misc NLP things for covid-19 analysis, stemming from over 100 000 000 tweets collected since mid-January 2020.
☆57Updated 3 years ago
Alternatives and similar repositories for covid19_unique_tweets:
Users that are interested in covid19_unique_tweets are comparing it to the libraries listed below
- Getting recommendations from natural language☆123Updated 4 years ago
- a bot that generates realistic replies using a combination of pretrained GPT-2 and BERT models☆193Updated 4 years ago
- DeEpLearning models for MultIlingual haTespeech (DELIMIT): Benchmarking multilingual models across 9 languages and 16 datasets.☆109Updated last year
- Cleans Reddit Text Data☆81Updated 4 years ago
- Topic Inference with Zeroshot models☆61Updated last year
- Explainable Zero-Shot Topic Extraction☆62Updated 7 months ago
- a contextual, biasable, word-or-sentence-or-paragraph extractive summarizer powered by the latest in text embeddings (Bert, Universal Sen…☆230Updated 2 years ago
- Datasets I have created for scientific summarization, and a trained BertSum model☆115Updated 5 years ago
- Creating class-based TF-IDF matrices☆83Updated 2 years ago
- Clean personally identifiable information from dirty dirty text using spaCy.☆41Updated last year
- Interpretable data visualizations for understanding how texts differ at the word level☆274Updated last month
- Browse Covid-19 & SARS-CoV-2 Scientific Papers with Transformers 🦠 📖☆182Updated 2 years ago
- Pretrained BERT model for analysing COVID-19 Twitter data☆184Updated 2 years ago
- spaCy pipeline object for negating concepts in text☆279Updated 9 months ago
- The world's largest social media toxicity dataset.☆177Updated 2 years ago
- A repository to house model building experiments and tools that are part of the Conversation AI effort.☆139Updated 2 weeks ago
- Code release for "A Time-Aware Transformer Based Model for Suicide Ideation Detection on Social Media", EMNLP 2020.☆54Updated 4 years ago
- Notebooks configured to be run with Binder, usually found on my blog.☆42Updated 2 years ago
- Text summarization algorithm for the Capstone Project at Springboard code bootcamp☆54Updated 2 years ago
- A set of tools for leveraging pre-trained embeddings, active learning and model explainability for effecient document classification☆29Updated 2 months ago
- A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data…☆242Updated 10 months ago
- Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.☆91Updated 3 years ago
- A monolingual and cross-lingual meta-embedding generation and evaluation framework☆80Updated 2 years ago
- Hate speech dataset from Stormfront forum manually labelled at sentence level.☆171Updated 4 years ago
- 🏖 Easy training and deployment of seq2seq models.☆228Updated 4 years ago
- COVID-19 Open Research Dataset (CORD-19) Analysis☆56Updated 2 years ago
- Fuzzy matching and more functionality for spaCy.☆256Updated 8 months ago
- Social Media Mining Toolkit (SMMT) main repository☆134Updated 2 years ago
- On Generating Extended Summaries of Long Documents☆78Updated 4 years ago
- STriP Net: Semantic Similarity of Scientific Papers (S3P) Network☆85Updated 2 years ago