parkervg / news-article-clustering
A document similarity project attempting to cluster news stories covering identical events.
☆26Updated 4 years ago
Alternatives and similar repositories for news-article-clustering:
Users that are interested in news-article-clustering are comparing it to the libraries listed below
- Target-dependent sentiment classification in news articles reporting on political events. Includes a high-quality data set of over 11k se…☆150Updated last year
- Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?☆519Updated 6 months ago
- Scrape news articles and analyze them using NLP to quantify the gender gap in Canadian mainstream media☆42Updated 11 months ago
- This repository provides usage examples for the Python module Newspaper3k.☆147Updated last year
- Scrape data from Quora website: questions related to certain topics, answers given on certain questions and users profile data☆54Updated 2 years ago
- Text analysis with networks.☆284Updated 3 weeks ago
- SKILLSPAN: Competences as Spans for Skill Extraction from Job Postings☆60Updated 2 months ago
- A spaCy wrapper for DBpedia Spotlight☆109Updated 2 years ago
- ☆35Updated 3 years ago
- The objective of this project is to scrape a corpus of news articles from a set of web pages, pre-process the corpus, and then to apply u…☆51Updated 7 years ago
- Google News Scraper for languages like Japanese, Chinese... [VPN Support]☆98Updated 3 years ago
- Article extraction benchmark: dataset and evaluation scripts☆312Updated last year
- Source code for the Medium article "Extracting the author of news stories with DOM-based segmentation and BERT"☆29Updated 5 years ago
- ☆71Updated 7 years ago
- Multilingual Rapid Automatic Keyword Extraction (RAKE) for Python☆268Updated last year
- Package that returns a company embedding given a company name☆45Updated 4 years ago
- Tag news stories based on models trained on the NYT corpus.☆42Updated 2 years ago
- Steam review texting embedding analysis☆141Updated 2 years ago
- Code and Dataset for the Bhola et al. (2020) Retrieving Skills from Job Descriptions: A Language Model Based Extreme Multi-label Classifi…☆53Updated 3 years ago
- A multithread Pushshift.io API Wrapper for reddit.com comment and submission searches.☆217Updated 2 years ago
- A Python Package which helps to scrape all news details from any news websites☆199Updated 5 months ago
- A text analysis application for performing common NLP tasks through a web dashboard interface and an API☆125Updated 6 years ago
- Mega-COV: A Billion-Scale Dataset of 100+ Languages for COVID-19☆14Updated 4 years ago
- Train, evaluate, and use different unsupervised topic modelling algorithms using a RESTful API.☆36Updated last year
- 📗 Score text readability using a number of formulas: Flesch-Kincaid Grade Level, Gunning Fog, ARI, Dale Chall, SMOG, and more☆375Updated 7 months ago
- Cleans Reddit Text Data☆81Updated 5 years ago
- Quote extraction for modular journalism (JournalismAI collab 2021)☆227Updated 3 years ago
- Social Analysis based on Whatsapp data☆143Updated last year
- A Corpus of 475,000 Industrial Occupations☆67Updated 4 years ago
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidata☆94Updated 2 years ago