JoshData / pdf-redactor
A general purpose PDF text-layer redaction tool for Python 2/3.
☆196Updated 10 months ago
Alternatives and similar repositories for pdf-redactor:
Users that are interested in pdf-redactor are comparing it to the libraries listed below
- Pure-python library for adding annotations to PDFs☆201Updated 4 years ago
- Python library to extract tabular data from images and scanned PDFs☆277Updated 8 months ago
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆314Updated last year
- A fast and friendly PDF scraping library.☆777Updated last year
- Python binding to Poppler-cpp pdf library☆109Updated 7 months ago
- A Python tool to help extracting information from structured PDFs.☆402Updated 3 weeks ago
- Simple PDF text extraction☆922Updated 2 months ago
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆389Updated 8 months ago
- Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.☆521Updated 4 years ago
- CSS related utilities (parsing, serialization, etc) for python☆31Updated 7 months ago
- PDF to XML ALTO file converter☆236Updated last week
- Python script to do PDF OCR conversion using Tesseract☆374Updated last year
- Python interface to Apache PDFBox command-line tools.☆75Updated 2 years ago
- Extract price amount and currency symbol from a raw text string☆325Updated 2 months ago
- Convert html to docx☆77Updated 9 months ago
- Python API for PDF documents☆119Updated 7 months ago
- Demos, examples and utilities using PyMuPDF☆646Updated 9 months ago
- PDF minifier that allows removing duplicate data, re-compresses images, creation of PDF/A-1b and digital PDF signing☆55Updated 7 months ago
- PDF parser and converter to HTML☆85Updated 6 months ago
- THIS REPOSITORY IS FORK☆30Updated 2 years ago
- Convert a PDF via OCR to a TXT file in UTF-8 encoding☆148Updated last year
- Python bindings to PDFium☆560Updated this week
- Reading legal authority for the last time☆36Updated last month
- Extract dates from text☆64Updated 4 years ago
- Python package for Google's diff-match-patch native C++ implementation.☆75Updated 10 months ago
- LA-PDFText is a system for extracting accurate text from PDF-based research articles (and an interface to be able to improve performance …☆82Updated 7 years ago
- Extract structured data from PDF invoices☆1,949Updated 2 weeks ago
- Pre-Recognize Library - library with algorithms for improving OCR quality.☆104Updated last year
- The scripts for training Detectron2-based Layout Models on popular layout analysis datasets☆210Updated last year
- Textricator is a tool to extract text from documents and generate structured data.☆347Updated last month