codelucas/newspaper

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/codelucas/newspaper)

codelucas / newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

☆15,105

Alternatives and similar repositories for newspaper

Users that are interested in newspaper are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

grangier / python-goose
View on GitHub
Html Content / Article Extractor, web scrapping lib in Python
☆4,101Mar 10, 2026Updated 4 months ago
fhamborg / news-please
View on GitHub
news-please - an integrated web crawler and information extractor for news that just works
☆2,470Apr 14, 2026Updated 3 months ago
buriy / python-readability
View on GitHub
fast python port of arc90's readability tool, updated to match latest readability.js!
☆2,894Jan 26, 2026Updated 5 months ago
explosion / spaCy
View on GitHub
💫 Industrial-strength Natural Language Processing (NLP) in Python
☆33,752May 19, 2026Updated 2 months ago
dragnet-org / dragnet
View on GitHub
Just the facts -- web page content extraction
☆1,274Jul 8, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
scrapinghub / portia
View on GitHub
Visual scraping for Scrapy
☆9,508Jun 26, 2024Updated 2 years ago
miso-belica / sumy
View on GitHub
Module for automatic summarization of text documents and HTML pages.
☆3,694Updated this week
clips / pattern
View on GitHub
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
☆8,857Jun 10, 2024Updated 2 years ago
sloria / TextBlob
View on GitHub
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
☆9,541Updated this week
scrapy / scrapy
View on GitHub
Scrapy, a fast high-level web crawling & scraping framework for Python.
☆63,223Updated this week
facebookresearch / fastText
View on GitHub
Library for fast text representation and classification.
☆26,549Mar 22, 2024Updated 2 years ago
piskvorky / gensim
View on GitHub
Topic Modelling for Humans
☆16,464Nov 1, 2025Updated 8 months ago
psf / requests-html
View on GitHub
Pythonic HTML Parsing for Humans™
☆13,827Apr 16, 2024Updated 2 years ago
google / python-fire
View on GitHub
Python Fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object.
☆28,219Jul 1, 2026Updated 2 weeks ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
goose3 / goose3
View on GitHub
A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html
☆913Jun 22, 2026Updated 3 weeks ago
binux / pyspider
View on GitHub
A Powerful Spider(Web Crawler) System in Python.
☆16,801Apr 30, 2024Updated 2 years ago
flairNLP / flair
View on GitHub
A very simple framework for state-of-the-art Natural Language Processing (NLP)
☆14,382Oct 27, 2025Updated 8 months ago
adbar / trafilatura
View on GitHub
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XM…
☆6,312Updated this week
deanmalmgren / textract
View on GitHub
extract text from any document. no muss. no fuss.
☆4,669Jul 11, 2026Updated last week
joke2k / faker
View on GitHub
Faker is a Python package that generates fake data for you.
☆19,336Updated this week
AndyTheFactory / newspaper4k
View on GitHub
📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
☆1,128Updated this week
apache / superset
View on GitHub
Apache Superset is a Data Visualization and Data Exploration Platform
☆73,877Updated this week
mozilla / readability
View on GitHub
A standalone version of the readability lib
☆11,341Jul 9, 2026Updated last week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
GeneralNewsExtractor / GeneralNewsExtractor
View on GitHub
新闻网页正文通用抽取器 Beta 版.
☆3,788Apr 21, 2026Updated 2 months ago
michaelhelmick / lassie
View on GitHub
Web Content Retrieval for Humans™
☆629Jul 30, 2022Updated 3 years ago
seatgeek / fuzzywuzzy
View on GitHub
Fuzzy String Matching in Python
☆9,262Feb 24, 2023Updated 3 years ago
Miserlou / Zappa
View on GitHub
Serverless Python
☆11,835Mar 23, 2023Updated 3 years ago
kotartemiy / newscatcher
View on GitHub
Programmatically collect normalized news from (almost) any website.
☆2,989Oct 30, 2020Updated 5 years ago
kennethreitz / records
View on GitHub
SQL for Humans™
☆7,219Feb 9, 2026Updated 5 months ago
arrow-py / arrow
View on GitHub
🏹 Better dates & times for Python
☆9,048Jun 22, 2026Updated 3 weeks ago
tqdm / tqdm
View on GitHub
A Fast, Extensible Progress Bar for Python and CLI
☆31,240Updated this week
chartbeat-labs / textacy
View on GitHub
NLP, before and after spaCy
☆2,239Sep 22, 2023Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
streamlit / streamlit
View on GitHub
Streamlit — A faster way to build and share data apps.
☆45,269Updated this week
getredash / redash
View on GitHub
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
☆28,708Jul 9, 2026Updated last week
sanic-org / sanic
View on GitHub
Accelerate your web app development | Build fast. Run fast.
☆18,633Updated this week
plotly / dash
View on GitHub
Data Apps & Dashboards for Python. No JavaScript Required.
☆24,326Updated this week
scrapinghub / splash
View on GitHub
Lightweight, scriptable browser as a service with an HTTP API
☆4,190Aug 2, 2024Updated last year
kurtmckee / feedparser
View on GitHub
Parse feeds in Python
☆2,402Jul 6, 2026Updated last week
vi3k6i5 / flashtext
View on GitHub
Extract Keywords from sentence or Replace keywords in sentences.
☆5,714Apr 13, 2025Updated last year