JoshData / pdf-redactor
A general purpose PDF text-layer redaction tool for Python 2/3.
☆184Updated 3 months ago
Related projects: ⓘ
- Python interface to Apache PDFBox command-line tools.☆75Updated last year
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆363Updated last month
- Python script to do PDF OCR conversion using Tesseract☆372Updated last year
- A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.☆428Updated last year
- A fast and friendly PDF scraping library.☆769Updated 11 months ago
- Pure-python library for adding annotations to PDFs☆192Updated 3 years ago
- Python binding to Poppler-cpp pdf library☆95Updated 2 weeks ago
- Python module to drive the awesome pdftk binary.☆145Updated last year
- Simple PDF text extraction☆859Updated 4 months ago
- A pure python based utility to extract text and images from docx files.☆504Updated 11 months ago
- A Python tool to help extracting information from structured PDFs.☆368Updated 3 weeks ago
- A utility to read and write PDFs with Python☆330Updated 2 years ago
- Working with hOCR in Javascript☆119Updated last year
- PDF to XML ALTO file converter☆209Updated this week
- Python library to extract tabular data from images and scanned PDFs☆255Updated last month
- pdfrw is a pure Python library that reads and writes PDFs☆1,856Updated 4 months ago
- Python binding to libpoppler with focus on text extraction☆98Updated 2 years ago
- Python bindings to PDFium☆349Updated this week
- `pdfstructure` detects, splits and organizes the documents text content into its natural structure as envisioned by the author.☆97Updated 5 months ago
- A python library to make filling pdfs much easier☆129Updated last month
- Demos, examples and utilities using PyMuPDF☆548Updated 2 months ago
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆290Updated 11 months ago
- A simple, command line mail merge tool.☆138Updated 6 months ago
- Simplify DOCX files to JSON☆211Updated 8 months ago
- Ultimate Website Sitemap Parser☆178Updated last year
- Python address detector and parser☆199Updated 9 months ago
- A more complete example of programming with PDFMiner, which continues where the default documentation stops☆215Updated 4 years ago
- A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF …☆63Updated 3 years ago
- Simple python wrapper to convert HTML to PDF with headless Chrome via selenium☆45Updated last year
- Python API for PDF documents☆113Updated 2 weeks ago