aaronsw / html2textLinks
Convert HTML to Markdown-formatted text.
☆2,838Updated last year
Alternatives and similar repositories for html2text
Users that are interested in html2text are comparing it to the libraries listed below
Sorting:
- Convert HTML to Markdown-formatted text.☆2,100Updated last month
- fast python port of arc90's readability tool, updated to match latest readability.js!☆2,871Updated 7 months ago
- markdown2: A fast and complete implementation of Markdown in Python☆2,808Updated 3 weeks ago
- Html Content / Article Extractor, web scrapping lib in Python☆4,052Updated 3 years ago
- A fast yet powerful Python Markdown parser with renderers and plugins.☆2,920Updated 2 weeks ago
- Standards-compliant library for parsing and serializing HTML documents and fragments in Python☆1,212Updated last year
- A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html☆893Updated this week
- [abandoned] python port of arc90's readability bookmarklet☆542Updated 14 years ago
- Convert Word documents (.docx files) to HTML☆1,036Updated 3 weeks ago
- A versatile Python library for EPUB2/EPUB3 manipulation and processing.☆1,721Updated 3 weeks ago
- A Python implementation of John Gruber’s Markdown with Extension support.☆4,123Updated last week
- extract text from any document. no muss. no fuss.☆4,387Updated last year
- A jquery-like library for python☆2,374Updated last year
- A Python Static Website Generator (See https://duct-ui.org from the author).☆1,701Updated last year
- A pure-python HTML screen-scraping library☆1,887Updated 3 years ago
- Parse feeds in Python☆2,247Updated last week
- Convert HTML to Markdown☆1,975Updated 3 weeks ago
- Thin wrapper for "pandoc" (MIT)☆1,084Updated last week
- pdfrw is a pure Python library that reads and writes PDFs☆1,912Updated last year
- Reads, queries and modifies Microsoft Word 2007/2008 docx files.☆1,073Updated 10 years ago
- A library for reading (unencrypted) mobi-reader files in Python☆156Updated 2 years ago
- Python Command-line Application Tools☆97Updated 2 years ago
- A library for converting HTML into PDFs using ReportLab☆2,361Updated 3 months ago
- newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:☆14,897Updated last week
- Python character encoding detector☆2,309Updated 2 weeks ago
- Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes☆2,739Updated last week
- Webkit based scriptable web browser for python.☆2,762Updated last year
- A fully tested, abstract interface to creating OAuth clients and servers.☆3,005Updated last year
- Python Subprocesses for Humans™.☆2,265Updated 8 years ago
- A generator library for concise, unambiguous and URL-safe UUIDs.☆2,163Updated 2 weeks ago