mwilliamson / python-mammoth
Convert Word documents (.docx files) to HTML
☆942Updated 4 months ago
Alternatives and similar repositories for python-mammoth
Users that are interested in python-mammoth are comparing it to the libraries listed below
Sorting:
- A pure python based utility to extract text and images from docx files.☆546Updated last month
- Create and modify Word documents with Python☆5,000Updated 8 months ago
- An extendable docx file format parser and converter☆191Updated 4 years ago
- Python bindings to PDFium☆568Updated this week
- A fast yet powerful Python Markdown parser with renderers and plugins.☆2,765Updated last month
- Thin wrapper for "pandoc" (MIT)☆981Updated last month
- Demos, examples and utilities using PyMuPDF☆659Updated 10 months ago
- Use a docx as a jinja2 template☆2,187Updated last week
- Simplify DOCX files to JSON☆235Updated 7 months ago
- Python API for PDF documents☆121Updated 8 months ago
- A fast and friendly PDF scraping library.☆777Updated last year
- A utility to read and write PDFs with Python☆335Updated 3 years ago
- Single API for reading, manipulating and writing data in csv, ods, xls, xlsx and xlsm files☆1,240Updated 2 weeks ago
- A Python library for reading and writing PDF, powered by QPDF☆2,340Updated 3 weeks ago
- A python module that wraps the pdftoppm utility to convert PDF to PIL Image object☆1,783Updated 9 months ago
- pdfrw is a pure Python library that reads and writes PDFs☆1,892Updated last year
- Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.☆1,586Updated last month
- Convert html to docx☆78Updated 10 months ago
- A library for converting HTML into PDFs using ReportLab☆2,300Updated last week
- Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes☆2,693Updated 2 weeks ago
- A python wrapper for libmagic☆2,753Updated 2 months ago
- Python library for parsing .docx (Office Open XML) files☆51Updated 5 years ago
- A package acting as a wrapper around the headless mode of existing web browsers to generate images from URLs and from HTML+CSS strings or…☆394Updated 8 months ago
- Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors☆1,224Updated this week
- Convert your vector images☆826Updated 5 months ago
- extract text from any document. no muss. no fuss.☆4,126Updated 5 months ago
- A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.☆444Updated last year
- ASCII transliterations of Unicode text - GitHub mirror☆560Updated 3 weeks ago
- Wkhtmltopdf python wrapper to convert html to pdf☆2,021Updated last year
- Simple PDF generation for Python (FPDF PHP port)☆879Updated 8 months ago