matthewwithanm / python-markdownify
Convert HTML to Markdown
☆1,564Updated this week
Alternatives and similar repositories for python-markdownify:
Users that are interested in python-markdownify are comparing it to the libraries listed below
- Markdown parser, done right. 100% CommonMark support, extensions, syntax plugins & high speed. Now in Python!☆846Updated this week
- Thin wrapper for "pandoc" (MIT)☆970Updated 2 weeks ago
- Python bindings to PDFium☆560Updated this week
- Fast, Accurate, Lightweight Python library to make State of the Art Embedding☆1,981Updated last week
- Convert HTML to Markdown-formatted text.☆1,972Updated last week
- RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF☆877Updated 2 weeks ago
- pgvector support for Python☆1,179Updated this week
- A fast yet powerful Python Markdown parser with renderers and plugins.☆2,754Updated 3 weeks ago
- Small, dependency-free, fast Python package to infer binary file types checking the magic numbers signature☆697Updated 7 months ago
- Search for text, news, images and videos using the DuckDuckGo.com search engine☆1,546Updated last week
- A python module to repair invalid JSON from LLMs☆1,793Updated this week
- extract text from any document. no muss. no fuss.☆4,094Updated 4 months ago
- Extract structured text from pdfs quickly☆469Updated last month
- Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors).☆1,249Updated 2 months ago
- Improved file parsing for LLM’s☆2,921Updated 5 months ago
- Fuzzy String Matching in Python☆3,176Updated last month
- A markdown parser with high extensibility.☆389Updated 2 weeks ago
- A multithreaded 🕸️ web crawler that recursively crawls a website and creates a 🔽 markdown file for each page, designed for LLM RAG☆377Updated 8 months ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆290Updated 3 weeks ago
- A fast, extensible and spec-compliant Markdown parser in pure Python.☆906Updated 2 months ago
- Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XM…☆4,154Updated last month
- markdown2: A fast and complete implementation of Markdown in Python☆2,741Updated last week
- Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.☆7,590Updated 3 weeks ago
- Demos, examples and utilities using PyMuPDF☆651Updated 9 months ago
- A python module that wraps the pdftoppm utility to convert PDF to PIL Image object☆1,768Updated 9 months ago
- Rapid fuzzy string matching in Python using various string metrics☆3,037Updated last week
- Parse feeds in Python☆2,094Updated 2 weeks ago
- Python library for creating PEG parsers☆2,304Updated 2 weeks ago
- UniTable: Towards a Unified Table Foundation Model☆461Updated 10 months ago
- A Python implementation of John Gruber’s Markdown with Extension support.☆3,961Updated this week