fhamborg/news-please

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/fhamborg/news-please)

fhamborg / news-please

news-please - an integrated web crawler and information extractor for news that just works

☆2,472

Alternatives and similar repositories for news-please

Users that are interested in news-please are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

fhamborg / Giveme5W1H
View on GitHub
Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?
☆533Oct 25, 2024Updated last year
fhamborg / NewsMTSC
View on GitHub
Target-dependent sentiment classification in news articles reporting on political events. Includes a high-quality data set of over 11k se…
☆156Jul 18, 2025Updated last year
codelucas / newspaper
View on GitHub
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
☆15,121Updated this week
commoncrawl / news-crawl
View on GitHub
News crawling with StormCrawler - stores content as WARC
☆375Updated this week
santhoshse7en / news-fetch
View on GitHub
A Python Package which helps to scrape all news details from any news websites
☆227Jul 10, 2026Updated 2 weeks ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
adbar / trafilatura
View on GitHub
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XM…
☆6,334Updated this week
flairNLP / fundus
View on GitHub
A very simple news crawler with a funny name
☆468Updated this week
AndyTheFactory / newspaper4k
View on GitHub
📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
☆1,127Updated this week
networkdynamics / seldonite
View on GitHub
A News Article Collection Library
☆22Mar 31, 2023Updated 3 years ago
buriy / python-readability
View on GitHub
fast python port of arc90's readability tool, updated to match latest readability.js!
☆2,894Jan 26, 2026Updated 5 months ago
lewisdonovan / google-news-scraper
View on GitHub
Lightweight scraper for Google News
☆376May 20, 2026Updated 2 months ago
scrapinghub / article-extraction-benchmark
View on GitHub
Article extraction benchmark: dataset and evaluation scripts
☆376May 29, 2026Updated last month
kotartemiy / pygooglenews
View on GitHub
If Google News had a Python library
☆1,389Dec 9, 2024Updated last year
flairNLP / flair
View on GitHub
A very simple framework for state-of-the-art Natural Language Processing (NLP)
☆14,382Oct 27, 2025Updated 8 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
dragnet-org / dragnet
View on GitHub
Just the facts -- web page content extraction
☆1,274Jul 8, 2025Updated last year
ivbeg / newsworker
View on GitHub
Advanced news feeds extractor and finder library. Helps to automatically extract news from websites without RSS/ATOM feeds
☆86Jul 5, 2026Updated 2 weeks ago
miso-belica / jusText
View on GitHub
Heuristic based boilerplate removal tool
☆819Feb 25, 2025Updated last year
commoncrawl / cdx_toolkit
View on GitHub
A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine
☆208Jun 24, 2026Updated last month
kotartemiy / newscatcher
View on GitHub
Programmatically collect normalized news from (almost) any website.
☆2,989Oct 30, 2020Updated 5 years ago
ranahaani / GNews
View on GitHub
A Happy and lightweight Python Package that Provides an API to search for articles on Google News and returns a JSON response.
☆986Jun 25, 2026Updated 3 weeks ago
goose3 / goose3
View on GitHub
A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html
☆912Updated this week
johnbumgarner / newspaper3_usage_overview
View on GitHub
This repository provides usage examples for the Python module Newspaper3k.
☆152Jan 2, 2024Updated 2 years ago
lukasgebhard / Political-News-Filter
View on GitHub
A classifier that distinguishes political from non-political news articles.
☆31Jul 30, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
LuChang-CS / news-crawler
View on GitHub
A news crawler for BBC News, Reuters and New York Times.
☆131Dec 8, 2022Updated 3 years ago
fhamborg / NewsWCL50
View on GitHub
The first, open access evaluation dataset for methods to identify bias by word choice and labeling
☆26Oct 30, 2025Updated 8 months ago
jfilter / clean-text
View on GitHub
🧹 Python package for text cleaning
☆1,026May 15, 2026Updated 2 months ago
fhamborg / NewsBirdServer
View on GitHub
Matrix-based News Aggregation to Explore Media Bias
☆20Jun 26, 2018Updated 8 years ago
fhamborg / Giveme5W
View on GitHub
Extraction of the five journalistic W-questions (5W) from news articles
☆19May 16, 2018Updated 8 years ago
JasonKessler / scattertext
View on GitHub
Beautiful visualizations of how language differs among document types.
☆2,338Jul 4, 2026Updated 2 weeks ago
chartbeat-labs / textacy
View on GitHub
NLP, before and after spaCy
☆2,239Sep 22, 2023Updated 2 years ago
MilaNLProc / contextualized-topic-models
View on GitHub
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coher…
☆1,272Jul 24, 2025Updated last year
miso-belica / sumy
View on GitHub
Module for automatic summarization of text documents and HTML pages.
☆3,696Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
twintproject / twint
View on GitHub
An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, fo…
☆16,394Feb 23, 2023Updated 3 years ago
deepset-ai / haystack
View on GitHub
Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and a…
☆25,997Updated this week
kevinlu1248 / pyate
View on GitHub
PYthon Automated Term Extraction
☆318Feb 8, 2023Updated 3 years ago
DerwenAI / pytextrank
View on GitHub
Python implementation of TextRank algorithms ("textgraphs") for phrase extraction
☆2,219Jun 24, 2026Updated last month
NewsFetch / NewsFetch
View on GitHub
News API - fetch news from CommonCrawl, parse with NewsPlease, enrich with pre-trained machine-learning models, to structured searchable …
☆31Oct 5, 2022Updated 3 years ago
opensanctions / storyweb
View on GitHub
Extract networks of entities from journalistic reporting
☆49Jul 17, 2023Updated 3 years ago
doccano / doccano
View on GitHub
Open source annotation tool for machine learning practitioners.
☆10,715Apr 14, 2026Updated 3 months ago