buriy/python-readability

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/buriy/python-readability)

buriy / python-readability

fast python port of arc90's readability tool, updated to match latest readability.js!

☆2,894

Alternatives and similar repositories for python-readability

Users that are interested in python-readability are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

timbertson / python-readability
View on GitHub
[abandoned] python port of arc90's readability bookmarklet
☆542Jun 16, 2011Updated 15 years ago
grangier / python-goose
View on GitHub
Html Content / Article Extractor, web scrapping lib in Python
☆4,100Mar 10, 2026Updated 4 months ago
codelucas / newspaper
View on GitHub
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
☆15,100Jul 8, 2026Updated last week
dragnet-org / dragnet
View on GitHub
Just the facts -- web page content extraction
☆1,274Jul 8, 2025Updated last year
mozilla / readability
View on GitHub
A standalone version of the readability lib
☆11,337Jul 9, 2026Updated last week
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
kingwkb / readability
View on GitHub
a python readability
☆277Jun 22, 2017Updated 9 years ago
bookieio / breadability
View on GitHub
Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)
☆205May 9, 2024Updated 2 years ago
misja / python-boilerpipe
View on GitHub
Python interface to Boilerpipe, Boilerplate Removal and Fulltext Extraction from HTML pages
☆542Jul 17, 2021Updated 5 years ago
goose3 / goose3
View on GitHub
A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html
☆913Jun 22, 2026Updated 3 weeks ago
adbar / trafilatura
View on GitHub
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XM…
☆6,295Jul 9, 2026Updated last week
alan-turing-institute / ReadabiliPy
View on GitHub
A simple HTML content extractor in Python. Can be run as a wrapper for Mozilla's Readability.js package or in pure-python mode.
☆360Dec 2, 2024Updated last year
Alir3z4 / html2text
View on GitHub
Convert HTML to Markdown-formatted text.
☆2,169Oct 28, 2025Updated 8 months ago
miso-belica / jusText
View on GitHub
Heuristic based boilerplate removal tool
☆818Feb 25, 2025Updated last year
miso-belica / sumy
View on GitHub
Module for automatic summarization of text documents and HTML pages.
☆3,694Jun 23, 2026Updated 3 weeks ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
fhamborg / news-please
View on GitHub
news-please - an integrated web crawler and information extractor for news that just works
☆2,468Apr 14, 2026Updated 3 months ago
kohlschutter / boilerpipe
View on GitHub
Work in progress transmit from Google Code
☆1,127Jan 3, 2018Updated 8 years ago
rodricios / eatiht
View on GitHub
An exercise in unsupervised machine learning: Extract Article's Text in HTml documents.
☆430Jan 16, 2026Updated 6 months ago
scrapinghub / article-extraction-benchmark
View on GitHub
Article extraction benchmark: dataset and evaluation scripts
☆376May 29, 2026Updated last month
aaronsw / html2text
View on GitHub
Convert HTML to Markdown-formatted text.
☆2,817Feb 27, 2024Updated 2 years ago
GeneralNewsExtractor / GeneralNewsExtractor
View on GitHub
新闻网页正文通用抽取器 Beta 版.
☆3,788Apr 21, 2026Updated 2 months ago
scrapinghub / portia
View on GitHub
Visual scraping for Scrapy
☆9,508Jun 26, 2024Updated 2 years ago
michaelhelmick / lassie
View on GitHub
Web Content Retrieval for Humans™
☆629Jul 30, 2022Updated 3 years ago
postlight / parser
View on GitHub
📜 Extract meaningful content from the chaos of a web page
☆5,787Jul 10, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
kurtmckee / feedparser
View on GitHub
Parse feeds in Python
☆2,401Jul 6, 2026Updated last week
psf / requests-html
View on GitHub
Pythonic HTML Parsing for Humans™
☆13,827Apr 16, 2024Updated 2 years ago
datalib / libextract
View on GitHub
Extract data from websites using basic statistical magic
☆506Oct 2, 2020Updated 5 years ago
scrapinghub / splash
View on GitHub
Lightweight, scriptable browser as a service with an HTTP API
☆4,191Aug 2, 2024Updated last year
deanmalmgren / textract
View on GitHub
extract text from any document. no muss. no fuss.
☆4,663Jul 11, 2026Updated last week
scrapinghub / dateparser
View on GitHub
python parser for human readable dates
☆2,844Updated this week
Gerapy / GerapyAutoExtractor
View on GitHub
Auto Extractor Module
☆338Aug 19, 2024Updated last year
ReadabilityHoldings / python-readability-api
View on GitHub
Python wrapper for the Readability API.
☆132Sep 8, 2021Updated 4 years ago
explosion / spaCy
View on GitHub
💫 Industrial-strength Natural Language Processing (NLP) in Python
☆33,741May 19, 2026Updated last month
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
ptwobrussell / python-boilerpipe
View on GitHub
Python interface to Boilerpipe, Boilerplate Removal and Fulltext Extraction from HTML pages
☆32Sep 2, 2016Updated 9 years ago
weblyzard / inscriptis
View on GitHub
A python based HTML to text conversion library, command line client and Web service.
☆345Jun 22, 2026Updated 3 weeks ago
scrapinghub / extruct
View on GitHub
Extract embedded metadata from HTML markup
☆967Apr 1, 2026Updated 3 months ago
scrapy / scrapely
View on GitHub
A pure-python HTML screen-scraping library
☆1,884Apr 4, 2022Updated 4 years ago
chartbeat-labs / textacy
View on GitHub
NLP, before and after spaCy
☆2,241Sep 22, 2023Updated 2 years ago
binux / pyspider
View on GitHub
A Powerful Spider(Web Crawler) System in Python.
☆16,802Apr 30, 2024Updated 2 years ago
srid / readability
View on GitHub
[unmaintained] Python version of arc90's *older* readability.js
☆47Oct 30, 2011Updated 14 years ago