Alir3z4 / html2textLinks
Convert HTML to Markdown-formatted text.
☆2,077Updated 2 weeks ago
Alternatives and similar repositories for html2text
Users that are interested in html2text are comparing it to the libraries listed below
Sorting:
- fast python port of arc90's readability tool, updated to match latest readability.js!☆2,852Updated 6 months ago
- Parse feeds in Python☆2,230Updated last week
- A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html☆893Updated 2 weeks ago
- Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors☆1,285Updated last week
- python parser for human readable dates☆2,739Updated 2 weeks ago
- Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.☆1,629Updated 6 months ago
- Port of Google's language-detection library to Python.☆1,853Updated 8 months ago
- Convert HTML to Markdown☆1,840Updated 3 months ago
- Convert HTML to Markdown-formatted text.☆2,794Updated last year
- extract text from any document. no muss. no fuss.☆4,357Updated 11 months ago
- Python character encoding detector☆2,293Updated this week
- Convert Word documents (.docx files) to HTML☆1,020Updated last month
- Heuristic based boilerplate removal tool☆803Updated 8 months ago
- A jquery-like library for python☆2,364Updated last year
- Returns unicode slugs☆1,558Updated last month
- Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes☆2,730Updated 2 weeks ago
- A python based HTML to text conversion library, command line client and Web service.☆323Updated 3 weeks ago
- Thin wrapper for "pandoc" (MIT)☆1,057Updated this week
- Python binding to Modest and Lexbor engines. Fast HTML5 parser with CSS selectors for Python.☆1,460Updated last month
- A python wrapper for libmagic☆2,839Updated last month
- A fast yet powerful Python Markdown parser with renderers and plugins.☆2,893Updated 2 months ago
- A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) us…☆1,497Updated 2 years ago
- Useful extensions to the standard Python datetime features☆2,555Updated last month
- 🪼 a python library for doing approximate and phonetic matching of strings.☆2,166Updated 2 weeks ago
- Extract embedded metadata from HTML markup☆935Updated last month
- Persistent HTTP cache for python requests☆1,458Updated last month
- pdfrw is a pure Python library that reads and writes PDFs☆1,911Updated last year
- Fixes mojibake and other glitches in Unicode text, after the fact.☆3,982Updated last year
- Standards-compliant library for parsing and serializing HTML documents and fragments in Python☆1,209Updated last year
- Accurately separates a URL’s subdomain, domain, and public suffix, using the Public Suffix List (PSL).☆1,943Updated 3 weeks ago