Alir3z4 / html2textLinks
Convert HTML to Markdown-formatted text.
☆2,040Updated 4 months ago
Alternatives and similar repositories for html2text
Users that are interested in html2text are comparing it to the libraries listed below
Sorting:
- Parse feeds in Python☆2,175Updated 2 weeks ago
- fast python port of arc90's readability tool, updated to match latest readability.js!☆2,837Updated 3 months ago
- Convert HTML to Markdown-formatted text.☆2,740Updated last year
- extract text from any document. no muss. no fuss.☆4,263Updated 8 months ago
- A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html☆881Updated 8 months ago
- Convert HTML to Markdown☆1,756Updated 2 weeks ago
- Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors☆1,262Updated 3 weeks ago
- python parser for human readable dates☆2,711Updated last week
- Python character encoding detector☆2,282Updated 7 months ago
- Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.☆1,611Updated 4 months ago
- Port of Google's language-detection library to Python.☆1,830Updated 5 months ago
- Thin wrapper for "pandoc" (MIT)☆1,028Updated last month
- Wkhtmltopdf python wrapper to convert html to pdf☆2,030Updated last year
- Convert Word documents (.docx files) to HTML☆981Updated 3 weeks ago
- A python based HTML to text conversion library, command line client and Web service.☆315Updated 2 weeks ago
- Standards-compliant library for parsing and serializing HTML documents and fragments in Python☆1,202Updated last year
- Returns unicode slugs☆1,545Updated 2 months ago
- pdfrw is a pure Python library that reads and writes PDFs☆1,897Updated last year
- A python wrapper for libmagic☆2,803Updated last week
- Heuristic based boilerplate removal tool☆790Updated 5 months ago
- The lxml XML toolkit for Python☆2,908Updated this week
- 🌐 URL parsing and manipulation made easy.☆2,708Updated 4 months ago
- Accurately separates a URL’s subdomain, domain, and public suffix, using the Public Suffix List (PSL).☆1,921Updated 2 weeks ago
- A fast yet powerful Python Markdown parser with renderers and plugins.☆2,848Updated 3 months ago
- A jquery-like library for python☆2,360Updated 11 months ago
- Extract embedded metadata from HTML markup☆929Updated 5 months ago
- emoji terminal output for Python☆1,993Updated 4 months ago
- Python PDF Parser (Not actively maintained). Check out pdfminer.six.☆5,296Updated 2 years ago
- Small, dependency-free, fast Python package to infer binary file types checking the magic numbers signature☆717Updated 3 months ago
- Python module to generate ATOM feeds, RSS feeds and Podcasts.☆770Updated last year