johnbumgarner / newshound
This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around the world in over 50 languages.
☆33Updated 2 years ago
Alternatives and similar repositories for newshound:
Users that are interested in newshound are comparing it to the libraries listed below
- A News Article Collection Library☆22Updated 2 years ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆38Updated 5 years ago
- Dataset: BuzzFeed News “Trending” Strip, 2018–2023☆19Updated last year
- Text classification automl☆21Updated 3 years ago
- Simple and clean Python implementation of TextRank as per seminal paper by Rada Mihalcea and Paul Tarau. This implementation performs bot…☆11Updated 4 years ago
- Labeled segmentation for the document structure of printed books☆13Updated 7 years ago
- Extract dates from text☆64Updated 4 years ago
- NSS Capstone project to use natural language modeling, classification, and information extraction to get the exact employee count values …☆15Updated 6 years ago
- NLG Best Practices for Data-Efficient Modeling How to Train Production-Ready Models with Little Data☆10Updated 3 years ago
- Detecting gibberish as a type of sentiment analysis with GPT2☆24Updated 4 years ago
- A simple web application for searching Word2Vec embeddings derived from approximately 2,000 law reports published by the The Incorporated…☆26Updated 2 years ago
- Python based Wikidata framework for easy dataframe extraction☆43Updated last year
- News API - fetch news from CommonCrawl, parse with NewsPlease, enrich with pre-trained machine-learning models, to structured searchable …☆28Updated 2 years ago
- An open-source NLP library: fast text cleaning and preprocessing☆23Updated 3 years ago
- Find rss, atom, xml, and rdf feeds on webpages☆30Updated 5 months ago
- Extract knowledge from raw text☆13Updated 3 years ago
- MoodCat😼 classifies the mood of English sentences.☆14Updated 2 years ago
- Rust python bindings for symspell☆19Updated last year
- This repository provides usage examples for the Python module Newspaper3k.☆146Updated last year
- Tool for the Automatic Assessment of Lexical Diversity☆11Updated 4 years ago
- Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingwai☆40Updated 2 years ago
- Code and data for Teddy https://arxiv.org/abs/2001.05171.☆15Updated 2 years ago
- Interpretable feature construction from taxonomies for text classification☆18Updated 3 years ago
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆22Updated 2 years ago
- A list of over 5000 US news domains and their social media accounts☆44Updated 2 years ago
- Parse government documents into well formed JSON☆68Updated last month
- Finds linguistic patterns effortlessly☆36Updated last year
- An ongoing series of notebooks aimed at helping fellow NLP enthusiasts think about applying new tools and techniques to practical tasks.☆18Updated 4 years ago
- an experimental implementation of Burrow's delta in Python 3☆21Updated 3 years ago