mwilliamson/python-mammoth

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mwilliamson/python-mammoth)

mwilliamson / python-mammoth

Convert Word documents (.docx files) to HTML

☆1,111

Alternatives and similar repositories for python-mammoth

Users that are interested in python-mammoth are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mwilliamson / mammoth.js
View on GitHub
Convert Word documents (.docx files) to HTML
☆6,265May 24, 2026Updated last month
python-openxml / python-docx
View on GitHub
Create and modify Word documents with Python
☆5,683Jun 17, 2025Updated last year
CenterForOpenScience / pydocx
View on GitHub
An extendable docx file format parser and converter
☆194Jun 25, 2026Updated 3 weeks ago
ShayHill / docx2python
View on GitHub
Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.
☆208Updated this week
elapouya / python-docx-template
View on GitHub
Use a docx as a jinja2 template
☆2,676Jul 7, 2026Updated 2 weeks ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
lsaint / python-docx-oss
View on GitHub
CRUD Word documents with Python
☆13Feb 5, 2026Updated 5 months ago
ankushshah89 / python-docx2txt
View on GitHub
A pure python based utility to extract text and images from docx files.
☆586Mar 24, 2025Updated last year
alea-institute / kl3m-data
View on GitHub
KL3M training data collection and preprocessing
☆22Apr 14, 2025Updated last year
unoconv / unoconv
View on GitHub
Universal Office Converter - Convert between any document format supported by LibreOffice/OpenOffice.
☆2,746Apr 19, 2023Updated 3 years ago
pdfminer / pdfminer.six
View on GitHub
Community maintained fork of pdfminer - we fathom PDF
☆7,002Mar 13, 2026Updated 4 months ago
matthewwithanm / python-markdownify
View on GitHub
Convert HTML to Markdown
☆2,225Jun 30, 2026Updated 3 weeks ago
py-pdf / pypdf
View on GitHub
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
☆10,121Jun 30, 2026Updated 3 weeks ago
microsoft / Simplify-Docx
View on GitHub
Simplify DOCX files to JSON
☆265Sep 26, 2024Updated last year
pymupdf / PyMuPDF
View on GitHub
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
☆10,283Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
jsvine / pdfplumber
View on GitHub
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
☆10,575Updated this week
pypdfium2-team / pypdfium2
View on GitHub
Python bindings to PDFium, reasonably cross-platform.
☆799Updated this week
JessicaTegner / pypandoc
View on GitHub
Thin wrapper for "pandoc" (MIT)
☆1,146Jul 6, 2026Updated 2 weeks ago
erezlife / html2docx
View on GitHub
Convert HTML to docx
☆38Jan 16, 2024Updated 2 years ago
neelguha / legal-segmenter
View on GitHub
A simple library for segmenting legal texts
☆18Apr 22, 2023Updated 3 years ago
Kozea / WeasyPrint
View on GitHub
The awesome document factory
☆9,408Updated this week
pikepdf / pikepdf
View on GitHub
A Python library for reading and writing PDF, powered by QPDF
☆2,766Updated this week
deanmalmgren / textract
View on GitHub
extract text from any document. no muss. no fuss.
☆4,670Jul 11, 2026Updated last week
ArtifexSoftware / pdf2docx
View on GitHub
Open source Python library for converting PDF to DOCX.
☆3,469May 1, 2026Updated 2 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Unstructured-IO / unstructured
View on GitHub
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…
☆15,176Updated this week
camelot-dev / camelot
View on GitHub
A Python library to extract tabular data from PDFs
☆3,786Updated this week
opendatalab / PDF-Extract-Kit
View on GitHub
A Comprehensive Toolkit for High-Quality PDF Content Extraction
☆9,797Jan 3, 2025Updated last year
arrow-py / arrow
View on GitHub
🏹 Better dates & times for Python
☆9,051Jun 22, 2026Updated last month
JazzCore / python-pdfkit
View on GitHub
Wkhtmltopdf python wrapper to convert html to pdf
☆2,041Oct 26, 2023Updated 2 years ago
executablebooks / markdown-it-py
View on GitHub
Markdown parser, done right. 100% CommonMark support, extensions, syntax plugins & high speed. Now in Python!
☆1,340Updated this week
datalab-to / surya
View on GitHub
OCR, layout analysis, reading order, table recognition in 90+ languages
☆21,130Updated this week
weblyzard / inscriptis
View on GitHub
A python based HTML to text conversion library, command line client and Web service.
☆345Jun 22, 2026Updated last month
Belval / pdf2image
View on GitHub
A python module that wraps the pdftoppm utility to convert PDF to PIL Image object
☆1,975Jul 23, 2024Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
xhtml2pdf / xhtml2pdf
View on GitHub
A library for converting HTML into PDFs using ReportLab
☆2,388Jan 19, 2026Updated 6 months ago
coolwanglu / pdf2htmlEX
View on GitHub
Convert PDF to HTML without losing text or format.
☆10,606Jun 2, 2023Updated 3 years ago
Python-Markdown / markdown
View on GitHub
A Python implementation of John Gruber’s Markdown with Extension support.
☆4,226Jul 8, 2026Updated 2 weeks ago
euske / pdfminer
View on GitHub
Python PDF Parser (Not actively maintained). Check out pdfminer.six.
☆5,283Dec 7, 2022Updated 3 years ago
boristsr / FaceLean
View on GitHub
An experiment to use a webcam as a game input device.
☆12Nov 22, 2022Updated 3 years ago
toxicphreAK / python-docx-ng
View on GitHub
Create and modify Word documents with Python (next-gen)
☆23Sep 28, 2023Updated 2 years ago
lepture / mistune
View on GitHub
A fast yet powerful Python Markdown parser with renderers and plugins.
☆3,057Updated this week