JoshData / pdf-redactorLinks
A general purpose PDF text-layer redaction tool for Python 2/3.
☆204Updated last year
Alternatives and similar repositories for pdf-redactor
Users that are interested in pdf-redactor are comparing it to the libraries listed below
Sorting:
- A Python tool to help extracting information from structured PDFs.☆417Updated last week
- Pure-python library for adding annotations to PDFs☆208Updated 4 years ago
- Python API for PDF documents☆124Updated last year
- Simple PDF text extraction☆955Updated 8 months ago
- A utility to read and write PDFs with Python☆338Updated 3 years ago
- A fast and friendly PDF scraping library.☆782Updated 2 years ago
- Convert a PDF via OCR to a TXT file in UTF-8 encoding☆152Updated 2 years ago
- A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.☆454Updated 2 years ago
- Python interface to Apache PDFBox command-line tools.☆78Updated 2 years ago
- Simplify DOCX files to JSON☆253Updated last year
- Demos, examples and utilities using PyMuPDF☆685Updated last year
- Simple, Pythonic extraction of text, shapes and images from PDFs☆80Updated 5 years ago
- A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!☆299Updated 5 months ago
- PDF to XML ALTO file converter☆254Updated last month
- A pure python based utility to extract text and images from docx files.☆564Updated 7 months ago
- Python binding to Poppler-cpp pdf library☆113Updated last year
- Multiple and Large PDF Documents Text Extraction.☆131Updated 8 months ago
- A more complete example of programming with PDFMiner, which continues where the default documentation stops☆216Updated 5 years ago
- THIS REPOSITORY IS FORK☆30Updated 2 years ago
- python app/framework for 'all things ISBN' including metadata, descriptions, covers...☆231Updated 2 years ago
- Python binding to libpoppler with focus on text extraction☆97Updated 3 years ago
- A utility to read and write PDFs with Python☆73Updated last year
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆327Updated 2 years ago
- Simple python wrapper to convert HTML to PDF with headless Chrome via selenium☆47Updated 2 years ago
- Convert html to docx☆83Updated last year
- Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.☆192Updated last week
- Append/Concatenate .docx documents☆119Updated last year
- ☆582Updated 2 weeks ago
- gcv2hocr converts from Google Cloud Vision OCR output to hocr to make a searchable pdf.☆107Updated 5 years ago
- Find parts of long text or data, allowing for some changes/typos.☆332Updated 5 months ago