py-pdf / sample-filesLinks
Files which can be used to test PDF readers
☆44Updated 3 months ago
Alternatives and similar repositories for sample-files
Users that are interested in sample-files are comparing it to the libraries listed below
Sorting:
- Industry-based resolutions for issues and errata reported against any PDF-related specification☆73Updated last week
- Easy to use PDF CLI tool powered by PDFium and go-pdfium☆27Updated 4 months ago
- Document image dewarping library using a cubic sheet model☆160Updated this week
- Get semantic HTML from PDFs, recover lost text, tables, data... in bulk.☆31Updated 7 months ago
- RUPS is an acronym for Reading and Updating PDF Syntax. RUPS is a tool built on top of iText® that allows you to look inside a PDF docume…☆315Updated 3 weeks ago
- Scan Tailor Experimental is an interactive post-processing tool for scanned pages.☆74Updated this week
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆394Updated 10 months ago
- Library used to deskew a scanned document☆470Updated this week
- Document Image Binarization☆77Updated 8 months ago
- Pre-Recognize Library - library with algorithms for improving OCR quality.☆106Updated 2 years ago
- gcv2hocr converts from Google Cloud Vision OCR output to hocr to make a searchable pdf.☆106Updated 4 years ago
- A step-by-step C# implementation of the Docstrum algorithm☆23Updated 4 years ago
- A curated list of resources around PDF files☆134Updated 10 months ago
- ☆18Updated 5 months ago
- ☆750Updated 2 months ago
- OCRmyPDF EasyOCR plugin☆86Updated 2 months ago
- Test and benchmark various PDF page rasterization utilities☆19Updated 10 years ago
- Tools for extract figure, table, text, .. from a pdf document.☆32Updated 4 years ago
- The hOCR Embedded OCR Workflow and Output Format☆73Updated 10 months ago
- Web based JavaScript GUI library for proofreading/editing hOCR☆95Updated 6 years ago
- Document Layout Analysis☆376Updated 2 weeks ago
- A vendor- and implementation-independent specification-derived, machine-readable model of PDF.☆85Updated 3 weeks ago
- This is a mirror: the canonical repo is: git.ghostscript.com/ghostpdl.git https://www.ghostscript.com☆139Updated this week
- PDF-Raster sample code for TWAIN-WG☆12Updated 3 years ago
- Demos, examples and utilities using PyMuPDF☆664Updated 11 months ago
- Converts MathML to LaTeX☆94Updated last week
- Pure-python library for adding annotations to PDFs☆202Updated 4 years ago
- faster page_dewarp in C++☆32Updated 3 years ago
- Convert omml to latex for displaying in web browsers (KaTeX)☆31Updated 4 years ago
- An open source set of Java filters for creating, merging and validating XLIFF 1.2, 2.0, 2.1 and 2.2 files.☆75Updated this week