alan-turing-institute/ReadabiliPy

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/alan-turing-institute/ReadabiliPy)

alan-turing-institute / ReadabiliPy

A simple HTML content extractor in Python. Can be run as a wrapper for Mozilla's Readability.js package or in pure-python mode.

☆359

Alternatives and similar repositories for ReadabiliPy

Users that are interested in ReadabiliPy are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mozilla / readability
View on GitHub
A standalone version of the readability lib
☆11,357Jul 9, 2026Updated 2 weeks ago
buriy / python-readability
View on GitHub
fast python port of arc90's readability tool, updated to match latest readability.js!
☆2,895Jan 26, 2026Updated 5 months ago
adbar / trafilatura
View on GitHub
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XM…
☆6,337Jul 18, 2026Updated last week
weblyzard / inscriptis
View on GitHub
A python based HTML to text conversion library, command line client and Web service.
☆345Updated this week
scrapinghub / article-extraction-benchmark
View on GitHub
Article extraction benchmark: dataset and evaluation scripts
☆376May 29, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
matthewwithanm / python-markdownify
View on GitHub
Convert HTML to Markdown
☆2,228Jun 30, 2026Updated 3 weeks ago
phihung / fh_utils
View on GitHub
A collection of utilities for FastHTML projects.
☆14Oct 23, 2024Updated last year
MichalKarol / futurepool
View on GitHub
FuturePool is a package that introduce known concept of multiprocessing Pool to the async/await world. It allows for easy translation fro…
☆14Nov 10, 2024Updated last year
miso-belica / jusText
View on GitHub
Heuristic based boilerplate removal tool
☆819Feb 25, 2025Updated last year
NachiketGadekar1 / browserllama
View on GitHub
Browser extension that lets you summarize and chat with any webpage using a local LLM of your choice.
☆23Oct 24, 2024Updated last year
rian-dolphin / fasthtml-chat
View on GitHub
A chat implementation for FastHTML
☆12Sep 14, 2025Updated 10 months ago
Alir3z4 / html2text
View on GitHub
Convert HTML to Markdown-formatted text.
☆2,169Oct 28, 2025Updated 8 months ago
Knowledgator / utca
View on GitHub
Versatile framework designed to streamline the integration of your models, as well as those sourced from Hugging Face, into complex progr…
☆35Aug 21, 2025Updated 11 months ago
jmriebold / BoilerPy3
View on GitHub
Python port of Boilerpipe library
☆96Aug 20, 2024Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
fcrepo-exts / fcrepo-webapp-plus
View on GitHub
Fcrepo4 webapp plus optional fcrepo dependencies
☆13Sep 30, 2020Updated 5 years ago
ChenyangGao / python-epub3
View on GitHub
An awsome epub3 library.
☆15Dec 2, 2023Updated 2 years ago
csarven / linked-sdmx
View on GitHub
Linked SDMX
☆17Oct 26, 2014Updated 11 years ago
zoho-labs / symspell
View on GitHub
Rust python bindings for symspell
☆21Dec 25, 2023Updated 2 years ago
andymckay / django-google-fonts
View on GitHub
☆18Dec 4, 2024Updated last year
pigroai / chatgpt-retrieval-plugin
View on GitHub
The ChatGPT Retrieval Plugin lets you easily search and find personal or work documents by asking questions in everyday language.
☆11Apr 22, 2024Updated 2 years ago
philschmid / clipper.js
View on GitHub
HTML to Markdown converter and crawler.
☆627Jan 9, 2024Updated 2 years ago
dottxt-ai / outlines
View on GitHub
Structured Outputs
☆15,331Updated this week
simonw / shot-scraper
View on GitHub
A CLI utility for taking screenshots of websites, recording video demos and scraping sites using JavaScript
☆2,532Jul 12, 2026Updated last week
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
unt-libraries / django-premis-event-service
View on GitHub
Django app for managing PREMIS Events
☆14Apr 28, 2026Updated 2 months ago
mjordan / ocr_rest
View on GitHub
A simple OCR service over REST
☆15Jul 29, 2014Updated 11 years ago
softwaredoug / searcharray
View on GitHub
Full text search that feels like a numpy array
☆311May 4, 2026Updated 2 months ago
codelucas / newspaper
View on GitHub
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
☆15,123Updated this week
Digital-Repository-of-Ireland / ansible-fedora4
View on GitHub
Ansible deployment of fedora 4, single or clustered on ubuntu 14.04
☆10Nov 11, 2015Updated 10 years ago
Acreom / quickadd
View on GitHub
Parse natural language time and date expressions in python
☆200Feb 18, 2026Updated 5 months ago
web-archive-group / heritrix-walkthrough
View on GitHub
☆10Jun 10, 2016Updated 10 years ago
simonw / datasette-publish-vercel
View on GitHub
Datasette plugin for publishing data using Vercel
☆47Aug 24, 2022Updated 3 years ago
lsb / sqlite-vector-search
View on GitHub
☆31Sep 1, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
simonw / llm-cluster
View on GitHub
LLM plugin for clustering embeddings
☆82Mar 1, 2024Updated 2 years ago
fhamborg / news-please
View on GitHub
news-please - an integrated web crawler and information extractor for news that just works
☆2,472Apr 14, 2026Updated 3 months ago
mnylc / islandora_multi_importer
View on GitHub
This is a flexible, twig based, all cmodel, tabular data to islandora Object importer with optional ZeroMQ processing
☆16Nov 29, 2020Updated 5 years ago
adbar / courlan
View on GitHub
Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters
☆178Updated this week
richardanaya / gbnf
View on GitHub
A library for working with GBNF files
☆31May 27, 2026Updated last month
ggravlingen / pyesef
View on GitHub
Extract information from XBRL files in the ESEF format
☆13Jan 3, 2026Updated 6 months ago
Key-wxh / market-fish
View on GitHub
Dont guess. Simulate. Multi-agent market prediction engine.
☆44Jul 7, 2026Updated 2 weeks ago