ankushshah89/python-docx2txt

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ankushshah89/python-docx2txt)

ankushshah89 / python-docx2txt

A pure python based utility to extract text and images from docx files.

☆586

Alternatives and similar repositories for python-docx2txt

Users that are interested in python-docx2txt are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

python-openxml / python-docx
View on GitHub
Create and modify Word documents with Python
☆5,683Jun 17, 2025Updated last year
ShayHill / docx2python
View on GitHub
Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.
☆208Updated this week
deanmalmgren / textract
View on GitHub
extract text from any document. no muss. no fuss.
☆4,670Jul 11, 2026Updated last week
microsoft / Simplify-Docx
View on GitHub
Simplify DOCX files to JSON
☆265Sep 26, 2024Updated last year
mwilliamson / python-mammoth
View on GitHub
Convert Word documents (.docx files) to HTML
☆1,111May 24, 2026Updated last month
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
prohandler / GS-Bulk-Emails
View on GitHub
Google App Scripts that sends a number of emails from the specific number and that tracks the open status of each email
☆17Dec 11, 2024Updated last year
Belval / pdf2image
View on GitHub
A python module that wraps the pdftoppm utility to convert PDF to PIL Image object
☆1,975Jul 23, 2024Updated last year
pdfminer / pdfminer.six
View on GitHub
Community maintained fork of pdfminer - we fathom PDF
☆7,002Mar 13, 2026Updated 4 months ago
py-pdf / pypdf
View on GitHub
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
☆10,121Jun 30, 2026Updated 3 weeks ago
col16 / pypassage
View on GitHub
Python module for working with bible references.
☆14Nov 7, 2020Updated 5 years ago
elapouya / python-docx-template
View on GitHub
Use a docx as a jinja2 template
☆2,676Jul 7, 2026Updated 2 weeks ago
ibm-hyperknowledge / hkpy
View on GitHub
A Python module to provide software abstractions to ease accessing hyperknowledge graphs
☆11Dec 19, 2024Updated last year
elapouya / python-textops3
View on GitHub
python module to manipulate text, strings and list of strings
☆21May 10, 2022Updated 4 years ago
euske / pdfminer
View on GitHub
Python PDF Parser (Not actively maintained). Check out pdfminer.six.
☆5,283Dec 7, 2022Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
henrihapponen / docxedit
View on GitHub
Edit Word (.docx) documents effortlessly without changing the original formatting.
☆23Mar 7, 2026Updated 4 months ago
mscarey / legislice
View on GitHub
API client for fetching and comparing passages from legislation
☆14Jun 29, 2026Updated 3 weeks ago
jsvine / pdfplumber
View on GitHub
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
☆10,575Updated this week
jalan / pdftotext
View on GitHub
☆1,063Jun 28, 2026Updated 3 weeks ago
chrismattmann / tika-python
View on GitHub
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
☆1,661Jul 1, 2026Updated 2 weeks ago
mscarey / justopinion
View on GitHub
Download client for legal opinions
☆13Jun 12, 2026Updated last month
streamlit / example-app-interactive-table
View on GitHub
☆18Jan 12, 2024Updated 2 years ago
pymupdf / PyMuPDF
View on GitHub
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
☆10,283Updated this week
neelguha / legal-segmenter
View on GitHub
A simple library for segmenting legal texts
☆18Apr 22, 2023Updated 3 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
scanny / python-pptx
View on GitHub
Create Open XML PowerPoint documents in Python
☆3,462Aug 7, 2024Updated last year
terrierteam / pyterrier_t5
View on GitHub
☆17Apr 30, 2026Updated 2 months ago
skececi / gptfree
View on GitHub
Building or integrating an LLM wrapper shouldn't take more than 10 minutes.
☆13Feb 1, 2025Updated last year
JessicaTegner / pypandoc
View on GitHub
Thin wrapper for "pandoc" (MIT)
☆1,146Jul 6, 2026Updated 2 weeks ago
textstat / textstat
View on GitHub
python package to calculate readability statistics of a text object - paragraphs, sentences, articles.
☆1,375Feb 18, 2026Updated 5 months ago
HazyResearch / pdftotree
View on GitHub
A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.
☆460Aug 3, 2023Updated 2 years ago
antiproblemist / excel2sql
View on GitHub
Convert excel wokrbook (All Sheets) to sqlite database
☆11Mar 19, 2016Updated 10 years ago
jcushman / pdfquery
View on GitHub
A fast and friendly PDF scraping library.
☆781Oct 17, 2023Updated 2 years ago
JustlyAI / lmss_entity_extractor
View on GitHub
Tool to apply Legal Matter Specification Standard (LMSS) to documents
☆12Aug 15, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
doccano / doccano
View on GitHub
Open source annotation tool for machine learning practitioners.
☆10,709Apr 14, 2026Updated 3 months ago
thomasthiebaud / spacy-fastlang
View on GitHub
Language detection using Spacy and Fasttext
☆54Dec 17, 2023Updated 2 years ago
chartbeat-labs / textacy
View on GitHub
NLP, before and after spaCy
☆2,239Sep 22, 2023Updated 2 years ago
mikemaccana / python-docx
View on GitHub
Reads, queries and modifies Microsoft Word 2007/2008 docx files.
☆1,076Sep 4, 2015Updated 10 years ago
fhamborg / Giveme5W
View on GitHub
Extraction of the five journalistic W-questions (5W) from news articles
☆19May 16, 2018Updated 8 years ago
thisissoon / Flask-HAL
View on GitHub
Flask Extension to easily add support for REST HATEOAS via the HAL Specification: https://tools.ietf.org/html/draft-kelly-json-hal-07
☆20May 25, 2018Updated 8 years ago
blester125 / iobes
View on GitHub
Tool for parsing and converting various span encoding schemes.
☆23Jan 13, 2024Updated 2 years ago