parkervg / news-article-clustering
A document similarity project attempting to cluster news stories covering identical events.
☆25Updated 4 years ago
Alternatives and similar repositories for news-article-clustering:
Users that are interested in news-article-clustering are comparing it to the libraries listed below
- Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?☆513Updated 3 months ago
- Target-dependent sentiment classification in news articles reporting on political events. Includes a high-quality data set of over 11k se…☆146Updated last year
- A Dataset of German Legal Documents for Named Entity Recognition☆165Updated 2 years ago
- A Python Package which helps to scrape all news details from any news websites☆191Updated 3 months ago
- This repository provides usage examples for the Python module Newspaper3k.☆146Updated last year
- Keyword extraction using TextRank algorithm after pre-processing the text with lemmatization, filtering unwanted parts-of-speech and othe…☆114Updated 5 years ago
- Ten Thousand German News Articles Dataset for Topic Classification☆84Updated 2 years ago
- Multilingual Rapid Automatic Keyword Extraction (RAKE) for Python☆269Updated last year
- Cleans Reddit Text Data☆81Updated 4 years ago
- A Python library for calculating a large variety of metrics from text☆324Updated 2 months ago
- 📊 Semantic search for headlines and story text☆359Updated last year
- Text analysis with networks.☆286Updated 9 months ago
- Tag news stories based on models trained on the NYT corpus.☆42Updated last year
- SpikeX - SpaCy Pipes for Knowledge Extraction☆397Updated 3 years ago
- Article extraction benchmark: dataset and evaluation scripts☆301Updated 9 months ago
- A Python program to scrape Google's Knowledge Panels for details on a list of businesses☆19Updated last year
- Document similarity algorithms experiment - Jaccard, TF-IDF, Doc2vec, USE, and BERT.☆84Updated 4 years ago
- Steam review texting embedding analysis☆141Updated last year
- Quote extraction for modular journalism (JournalismAI collab 2021)☆226Updated 3 years ago
- A spaCy wrapper for DBpedia Spotlight☆108Updated last year
- Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks☆157Updated 2 years ago
- PYthon Automated Term Extraction☆309Updated 2 years ago
- spaCy module for linking text to Wikidata items☆229Updated last year
- Pretrained BERT model for analysing COVID-19 Twitter data☆184Updated last year
- spacy-wordnet creates annotations that easily allow the use of wordnet and wordnet domains by using the nltk wordnet interface☆253Updated 5 months ago
- A data set and model for german sentiment classification.☆66Updated 6 months ago
- Repository for TweetEval☆363Updated 2 years ago
- Scrape news articles and analyze them using NLP to quantify the gender gap in Canadian mainstream media☆40Updated 9 months ago
- Given a sentence, predict if the sentence is a question or not☆47Updated 5 years ago
- Linguistic Inquiry and Word Count (LIWC) analyzer☆203Updated 3 years ago