JoshData / pdf-diff
A PDF comparison utility in Python.
☆453Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for pdf-diff
- Python binding to libpoppler with focus on text extraction☆98Updated 2 years ago
- a utility to extract the title from a PDF file☆133Updated 3 weeks ago
- Python script to do PDF OCR conversion using Tesseract☆373Updated last year
- A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.☆433Updated last year
- Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.☆1,041Updated last year
- Simple PDF text extraction☆870Updated 3 weeks ago
- MOVED TO https://gitlab.com/crossref/pdfextract☆509Updated 7 years ago
- compare two PDF files, write a resulting PDF with highlighted changes☆54Updated 3 months ago
- Style package for directly including color emojis in latex documents☆219Updated 5 years ago
- Pure-python library for adding annotations to PDFs☆196Updated 3 years ago
- Pure Python library for LaTeX to MathML conversion☆185Updated 2 weeks ago
- Textricator is a tool to extract text from documents and generate structured data.☆347Updated 2 weeks ago
- Use a text editor. Make a PDF.☆551Updated this week
- fault-tolerant Python3 package for searching, navigating, and modifying LaTeX documents☆286Updated 7 months ago
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆369Updated 3 months ago
- A post-processing tool for scanned sheets of paper.☆1,034Updated 3 months ago
- Quickly check whether there is a visible difference between two PDFs.☆63Updated 9 months ago
- A general purpose PDF text-layer redaction tool for Python 2/3.☆185Updated 4 months ago
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆299Updated last year
- Ocular is a state-of-the-art historical OCR system.☆253Updated 5 months ago
- PDF to XML ALTO file converter☆215Updated last month
- A library for extracting tables from PDF files☆90Updated 11 years ago
- A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!☆275Updated 9 months ago
- A more complete example of programming with PDFMiner, which continues where the default documentation stops☆215Updated 4 years ago
- Python interface to Apache PDFBox command-line tools.☆75Updated last year
- Extract tables from PDF pages.☆276Updated 4 years ago
- The Python document processor☆506Updated 3 weeks ago
- Scripts and results from our OCR roundup, available on Source☆150Updated 5 years ago
- Simple, Pythonic extraction of text, shapes and images from PDFs☆78Updated 4 years ago
- Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.☆497Updated 3 years ago