aaronsw / html2textLinks
Convert HTML to Markdown-formatted text.
☆2,740Updated last year
Alternatives and similar repositories for html2text
Users that are interested in html2text are comparing it to the libraries listed below
Sorting:
- Convert HTML to Markdown-formatted text.☆2,040Updated 4 months ago
- fast python port of arc90's readability tool, updated to match latest readability.js!☆2,837Updated 3 months ago
- A fast yet powerful Python Markdown parser with renderers and plugins.☆2,848Updated 3 months ago
- markdown2: A fast and complete implementation of Markdown in Python☆2,786Updated 3 weeks ago
- extract text from any document. no muss. no fuss.☆4,263Updated 8 months ago
- Lightweight, scriptable browser as a service with an HTTP API☆4,169Updated last year
- Html Content / Article Extractor, web scrapping lib in Python☆4,047Updated 3 years ago
- Parse feeds in Python☆2,181Updated 2 weeks ago
- A Python implementation of John Gruber’s Markdown with Extension support.☆4,058Updated 3 weeks ago
- Webkit based scriptable web browser for python.☆2,764Updated last year
- Scrapy+Splash for JavaScript integration☆3,222Updated 6 months ago
- A pure-python HTML screen-scraping library☆1,882Updated 3 years ago
- Reads, queries and modifies Microsoft Word 2007/2008 docx files.☆1,072Updated 9 years ago
- Standards-compliant library for parsing and serializing HTML documents and fragments in Python☆1,202Updated last year
- Python PDF Parser (Not actively maintained). Check out pdfminer.six.☆5,298Updated 2 years ago
- Mustache in Python☆1,311Updated 3 years ago
- A jquery-like library for python☆2,360Updated 11 months ago
- Python character encoding detector☆2,282Updated 7 months ago
- Stateful programmatic web browsing in Python, after Andy Lester's Perl module WWW::Mechanize .☆615Updated 8 years ago
- Thin wrapper for "pandoc" (MIT)☆1,028Updated last month
- Python Command-line Application Tools☆97Updated last year
- simplejson is a simple, fast, extensible JSON encoder/decoder for Python☆1,688Updated 5 months ago
- A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html☆881Updated 8 months ago
- 🌐 URL parsing and manipulation made easy.☆2,711Updated last week
- A fully tested, abstract interface to creating OAuth clients and servers.☆3,003Updated last year
- Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes☆2,713Updated 2 months ago
- Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors☆1,264Updated 3 weeks ago
- The lxml XML toolkit for Python☆2,908Updated this week
- ☆3,707Updated 4 years ago
- A Python library for automating interaction with websites.☆4,784Updated last week