AlJohri / docx2pdf
☆528Updated 6 months ago
Related projects ⓘ
Alternatives and complementary repositories for docx2pdf
- Demos, examples and utilities using PyMuPDF☆578Updated 4 months ago
- Convert Word documents (.docx files) to HTML☆818Updated 5 months ago
- Extract tables from scanned image PDFs using Optical Character Recognition.☆267Updated 4 years ago
- Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.☆1,511Updated 7 months ago
- A python module that wraps the pdftoppm utility to convert PDF to PIL Image object☆1,645Updated 4 months ago
- Simple PDF text extraction☆872Updated last month
- Use a docx as a jinja2 template☆2,015Updated last week
- A utility to read and write PDFs with Python☆332Updated 3 years ago
- Python script to do PDF OCR conversion using Tesseract☆373Updated last year
- Python bindings to PDFium☆427Updated 3 weeks ago
- Simplify DOCX files to JSON☆219Updated last month
- Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.☆501Updated 3 years ago
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆371Updated 3 months ago
- Create and modify Word documents with Python☆4,645Updated 3 months ago
- Convert xls file to xlsx (in python 3)☆54Updated 7 months ago
- Pure-python library for adding annotations to PDFs☆198Updated 3 years ago
- pdfrw is a pure Python library that reads and writes PDFs☆1,871Updated 6 months ago
- A pure python based utility to extract text and images from docx files.☆516Updated last year
- Simple command line utility for converting .doc & .xls files to any supported format such as Text, RTF, CSV or PDF☆449Updated last week
- Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.☆167Updated this week
- A fast and friendly PDF scraping library.☆772Updated last year
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆299Updated last year
- Python library to extract tabular data from images and scanned PDFs☆264Updated 3 months ago
- Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame☆2,196Updated last month
- ☆590Updated 3 weeks ago
- A post-processing tool for scanned sheets of paper.☆1,038Updated 4 months ago
- Extract structured data from PDF invoices☆1,846Updated 2 weeks ago
- Tesseract documentation☆1,839Updated last week
- Mail merge for Office Open XML (docx) files without the need for Microsoft Office Word.☆273Updated 4 months ago
- The simplest way to extract text from PDFs in Python☆427Updated 2 years ago