matthewwithanm/python-markdownify

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/matthewwithanm/python-markdownify)

matthewwithanm / python-markdownify

Convert HTML to Markdown

☆2,225

Alternatives and similar repositories for python-markdownify

Users that are interested in python-markdownify are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Alir3z4 / html2text
View on GitHub
Convert HTML to Markdown-formatted text.
☆2,169Oct 28, 2025Updated 8 months ago
adbar / trafilatura
View on GitHub
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XM…
☆6,318Updated this week
567-labs / instructor
View on GitHub
structured outputs for llms
☆13,579Jul 13, 2026Updated last week
executablebooks / markdown-it-py
View on GitHub
Markdown parser, done right. 100% CommonMark support, extensions, syntax plugins & high speed. Now in Python!
☆1,339Jul 13, 2026Updated last week
aaronsw / html2text
View on GitHub
Convert HTML to Markdown-formatted text.
☆2,819Feb 27, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
datalab-to / marker
View on GitHub
Convert PDF to markdown + JSON quickly with high accuracy
☆37,684Updated this week
Unstructured-IO / unstructured
View on GitHub
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…
☆15,170Updated this week
pymupdf / PyMuPDF
View on GitHub
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
☆10,272Updated this week
Python-Markdown / markdown
View on GitHub
A Python implementation of John Gruber’s Markdown with Extension support.
☆4,226Jul 8, 2026Updated last week
mixmark-io / turndown
View on GitHub
🛏 An HTML to Markdown converter written in JavaScript
☆11,334Jun 23, 2026Updated 3 weeks ago
alan-turing-institute / ReadabiliPy
View on GitHub
A simple HTML content extractor in Python. Can be run as a wrapper for Mozilla's Readability.js package or in pure-python mode.
☆359Dec 2, 2024Updated last year
run-llama / llama_index
View on GitHub
LlamaIndex is the leading document agent and OCR platform
☆50,962Updated this week
BerriAI / litellm
View on GitHub
The fastest, litest AI Gateway. Rust core with Python SDK. Call 100+ LLM APIs in OpenAI (or native) format with cost tracking, guardrails…
☆54,122Updated this week
lepture / mistune
View on GitHub
A fast yet powerful Python Markdown parser with renderers and plugins.
☆3,057Updated this week
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
stanfordnlp / dspy
View on GitHub
DSPy: The framework for programming—not prompting—language models
☆36,252Updated this week
miyuchina / mistletoe
View on GitHub
A fast, extensible and spec-compliant Markdown parser in pure Python.
☆1,057Jul 11, 2026Updated last week
pydantic / pydantic
View on GitHub
Data validation using Python type hints
☆28,330Updated this week
jina-ai / reader
View on GitHub
Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/
☆11,700May 22, 2026Updated last month
astral-sh / uv
View on GitHub
An extremely fast Python package and project manager, written in Rust.
☆87,673Updated this week
deepset-ai / haystack
View on GitHub
Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and a…
☆25,955Updated this week
dlon / html2markdown
View on GitHub
Conservatively convert html to markdown
☆99Sep 17, 2020Updated 5 years ago
jd / tenacity
View on GitHub
Retrying library for Python
☆8,726Updated this week
pydantic / pydantic-ai
View on GitHub
AI Agent Framework, the Pydantic way
☆18,674Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
guidance-ai / guidance
View on GitHub
A guidance language for controlling large language models.
☆21,685May 21, 2026Updated last month
frostming / marko
View on GitHub
A markdown parser with high extensibility.
☆462Jul 12, 2026Updated last week
theskumar / python-dotenv
View on GitHub
Reads key-value pairs from a .env file and can set them as environment variables. It helps in developing applications following the 12-fa…
☆8,826Updated this week
jsvine / pdfplumber
View on GitHub
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
☆10,570Updated this week
chroma-core / chroma
View on GitHub
Search infrastructure for AI
☆28,837Updated this week
JessicaTegner / pypandoc
View on GitHub
Thin wrapper for "pandoc" (MIT)
☆1,146Jul 6, 2026Updated 2 weeks ago
openai / tiktoken
View on GitHub
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
☆18,783May 24, 2026Updated last month
trentm / python-markdown2
View on GitHub
markdown2: A fast and complete implementation of Markdown in Python
☆2,819Jul 13, 2026Updated last week
dottxt-ai / outlines
View on GitHub
Structured Outputs
☆14,573Updated this week
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
neuml / txtai
View on GitHub
💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows
☆12,733Updated this week
streamlit / streamlit
View on GitHub
Streamlit — A faster way to build and share data apps.
☆45,292Updated this week
microsoft / playwright-python
View on GitHub
Python version of the Playwright testing and automation library.
☆14,838Updated this week
py-pdf / pypdf
View on GitHub
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
☆10,121Jun 30, 2026Updated 3 weeks ago
astral-sh / ruff
View on GitHub
An extremely fast Python linter and code formatter, written in Rust.
☆48,710Updated this week
ijl / orjson
View on GitHub
Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy
☆8,169Updated this week
encode / httpx
View on GitHub
A next generation HTTP client for Python. 🦋
☆15,356Mar 29, 2026Updated 3 months ago