JoshData / pdf-diff
A PDF comparison utility in Python.
☆468Updated 3 months ago
Alternatives and similar repositories for pdf-diff:
Users that are interested in pdf-diff are comparing it to the libraries listed below
- compare two PDF files, write a resulting PDF with highlighted changes☆56Updated 7 months ago
- Python binding to libpoppler with focus on text extraction☆97Updated 3 years ago
- Extract tables from PDF pages.☆287Updated 4 years ago
- MOVED TO https://gitlab.com/crossref/pdfextract☆508Updated 7 years ago
- Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.☆1,051Updated last year
- Convert LaTeX documents into beautiful responsive web pages using LaTeXML.☆1,086Updated last year
- fault-tolerant Python3 package for searching, navigating, and modifying LaTeX documents☆299Updated last month
- Prose diffs for any document format supported by Pandoc☆310Updated this week
- Content ExtRactor and MINEr☆493Updated 2 years ago
- Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.☆1,562Updated this week
- Document Layout Analysis☆361Updated this week
- A library and command line utility for diffing xml☆208Updated 10 months ago
- Text page dewarping using a "cubic sheet" model☆1,464Updated 2 years ago
- A fast and friendly PDF scraping library.☆774Updated last year
- Ocular is a state-of-the-art historical OCR system.☆262Updated 9 months ago
- Python script to do PDF OCR conversion using Tesseract☆374Updated last year
- Style package for directly including color emojis in latex documents☆221Updated 5 years ago
- Scripts and results from our OCR roundup, available on Source☆150Updated 6 years ago
- Simple PDF text extraction☆914Updated last month
- PDF Command Line Tools binaries for Linux, Mac, Windows☆635Updated 3 months ago
- pdfrw is a pure Python library that reads and writes PDFs☆1,884Updated 10 months ago
- A python script for checking BibLatex .bib files for common referencing mistakes!☆177Updated last year
- Camelot: PDF Table Extraction for Humans☆3,678Updated 2 years ago
- A web interface to extract tabular data from PDFs☆1,645Updated 2 months ago
- Generic framework for historical document processing☆375Updated 3 years ago
- The simplest way to extract text from PDFs in Python☆427Updated 2 years ago
- Python PDF Parser (Not actively maintained). Check out pdfminer.six.☆5,284Updated 2 years ago
- Query Google Scholar with Python☆294Updated last year
- Python interface to Graphviz's Dot language☆951Updated 2 weeks ago
- LA-PDFText is a system for extracting accurate text from PDF-based research articles (and an interface to be able to improve performance …☆82Updated 7 years ago