Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)
☆205May 9, 2024Updated last year
Alternatives and similar repositories for breadability
Users that are interested in breadability are comparing it to the libraries listed below
Sorting:
- python-readability, but faster (mirror-ish)☆82Jan 24, 2012Updated 14 years ago
- fast python port of arc90's readability tool, updated to match latest readability.js!☆2,889Jan 26, 2026Updated last month
- a python readability☆277Jun 22, 2017Updated 8 years ago
- Python wrapper for the Readability API.☆134Sep 8, 2021Updated 4 years ago
- Work in progress transmit from Google Code☆1,128Jan 3, 2018Updated 8 years ago
- [abandoned] python port of arc90's readability bookmarklet☆543Jun 16, 2011Updated 14 years ago
- Html Content / Article Extractor, web scrapping lib in Python☆4,063Dec 26, 2021Updated 4 years ago
- ☆18Jan 14, 2020Updated 6 years ago
- A difficulty-aware embedding of complementary deep networks for image classification☆13Jul 25, 2024Updated last year
- An exercise in unsupervised machine learning: Extract Article's Text in HTml documents.☆431Jan 16, 2026Updated last month
- C library for efficient string matching with Aho-Corasick☆21Jan 20, 2012Updated 14 years ago
- [unmaintained] Python version of arc90's *older* readability.js☆47Oct 30, 2011Updated 14 years ago
- Readability/Boilerpipe extraction in Python☆55May 6, 2016Updated 9 years ago
- Extract clean(er), readable text from web pages via Mercury Web Parser.☆122Jun 30, 2025Updated 8 months ago
- Exploration and charting of world income distribution☆12Oct 15, 2019Updated 6 years ago
- Solution to Kaggle's Google Research Football Competition☆14Dec 2, 2020Updated 5 years ago
- The more often you click a word in the headlines, the more interesting are your news.☆13Mar 27, 2017Updated 8 years ago
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Oct 24, 2016Updated 9 years ago
- couchapp + TiddlyWiki plugins to serve a TiddlyWiki from CouchDB and read and write tiddlers to the database☆18Sep 17, 2010Updated 15 years ago
- Basic codes of ml☆13Dec 2, 2019Updated 6 years ago
- A web scraper in Python using Django and Celery☆16May 12, 2013Updated 12 years ago
- ☆16Aug 7, 2019Updated 6 years ago
- Article content extraction database☆40Mar 1, 2023Updated 3 years ago
- Just the facts -- web page content extraction☆1,280Jul 8, 2025Updated 7 months ago
- A collection of Dashboard modules for Django Admin Tools, ncludes dashboards for Memcache statistics, Varnish statistics, and RSS dashboa…☆29Jan 11, 2012Updated 14 years ago
- Framework for evaluating text extraction algorithms implemented as web services☆42Jun 30, 2012Updated 13 years ago
- A minimal hugo blog layout☆15Sep 25, 2019Updated 6 years ago
- pubsub utils for django☆19Jun 25, 2022Updated 3 years ago
- ☆21Jun 13, 2019Updated 6 years ago
- Heuristic based boilerplate removal tool☆811Feb 25, 2025Updated last year
- Module for automatic summarization of text documents and HTML pages.☆3,661Feb 14, 2026Updated 2 weeks ago
- A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html☆902Feb 6, 2026Updated 3 weeks ago
- ☆25Jan 19, 2023Updated 3 years ago
- 📚 Turn any web page into a clean view☆2,523Apr 3, 2021Updated 4 years ago
- Reddit title generator API based on GPT-2☆18Dec 26, 2019Updated 6 years ago
- Open ONI (Open Online Newspaper Initiative) Django web app☆54Apr 3, 2025Updated 10 months ago
- Top 1 solution to the TBrain - 客戶續約金額預測 machine learning competition.☆23Sep 22, 2018Updated 7 years ago
- ☆23Jun 11, 2019Updated 6 years ago
- Kaggle | 45th place solution for IEEE's Signal Processing Society - Camera Model Identification☆30May 6, 2019Updated 6 years ago