ankushshah89 / python-docx2txt
A pure python based utility to extract text and images from docx files.
☆512Updated last year
Related projects ⓘ
Alternatives and complementary repositories for python-docx2txt
- A Python tool to help extracting information from structured PDFs.☆379Updated last week
- Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.☆164Updated this week
- Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.☆1,506Updated 6 months ago
- Simple PDF text extraction☆870Updated 3 weeks ago
- extract text from any document. no muss. no fuss.☆3,905Updated this week
- A utility to read and write PDFs with Python☆332Updated 2 years ago
- Adobe PDFServices python SDK Samples☆131Updated this week
- A simple viewer and inspection tool for text boxes in PDF documents☆92Updated 2 years ago
- Thin wrapper for "pandoc" (MIT)☆893Updated last week
- Create and modify Word documents with Python☆4,611Updated 2 months ago
- A more complete example of programming with PDFMiner, which continues where the default documentation stops☆215Updated 4 years ago
- Simplify DOCX files to JSON☆219Updated last month
- Demos, examples and utilities using PyMuPDF☆566Updated 4 months ago
- Find dates inside text using Python and get back datetime objects☆635Updated 5 months ago
- An extendable docx file format parser and converter☆188Updated 4 years ago
- Python API for PDF documents☆116Updated 2 months ago
- Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.☆623Updated 3 years ago
- Create, read, and modify Excel .xlsx files☆104Updated 4 years ago
- Python binding to libpoppler with focus on text extraction☆98Updated 2 years ago
- PyDictionary is a Dictionary Module for Python 2/3 to get meanings, translations, synonyms and antonyms of words☆274Updated last year
- ☆165Updated 4 months ago
- A python based HTML to text conversion library, command line client and Web service.☆276Updated 8 months ago
- Python interface to Apache PDFBox command-line tools.☆75Updated last year
- Python 3 port of pdfminer☆189Updated 6 years ago
- clone of http://bitbucket.org/ericgazoni/openpyxl☆173Updated 10 years ago
- A fast and friendly PDF scraping library.☆773Updated last year
- Convert html to docx☆73Updated 4 months ago
- A general purpose PDF text-layer redaction tool for Python 2/3.☆185Updated 4 months ago
- English word segmentation, written in pure-Python, and based on a trillion-word corpus.☆365Updated last year
- Pure-python library for adding annotations to PDFs☆196Updated 3 years ago