py-pdf / sample-filesLinks
Files which can be used to test PDF readers
☆43Updated 2 months ago
Alternatives and similar repositories for sample-files
Users that are interested in sample-files are comparing it to the libraries listed below
Sorting:
- A simple python wrapper for PDFium.☆17Updated 3 years ago
- Easy to use PDF CLI tool powered by PDFium and go-pdfium☆27Updated 3 months ago
- RUPS is an acronym for Reading and Updating PDF Syntax. RUPS is a tool built on top of iText® that allows you to look inside a PDF docume…☆312Updated last week
- JBIG2 Encoder☆20Updated 2 months ago
- Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an ou…☆179Updated this week
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆392Updated 9 months ago
- Test and benchmark various PDF page rasterization utilities☆19Updated 10 years ago
- This is a mirror: the canonical repo is: git.ghostscript.com/jbig2dec.git. This repo does not host releases, they are here: https://githu…☆43Updated 10 months ago
- Library used to deskew a scanned document☆468Updated this week
- Document image dewarping library using a cubic sheet model☆158Updated this week
- A curated list of resources around PDF files☆133Updated 10 months ago
- Tools for extract figure, table, text, .. from a pdf document.☆32Updated 4 years ago
- DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis☆345Updated 2 years ago
- Convert omml to latex for displaying in web browsers (KaTeX)☆31Updated 4 years ago
- The scripts for training Detectron2-based Layout Models on popular layout analysis datasets☆211Updated last year
- Parsing Module of Microsoft SQL Server Transaction log☆9Updated 2 years ago
- ☆10Updated 4 years ago
- This is a mirror: the canonical repo is: git.ghostscript.com/ghostpdl.git https://www.ghostscript.com☆138Updated this week
- ☆16Updated 4 months ago
- Tutorial on how to deskew (straighten) text images☆51Updated 3 years ago
- Python binding to Poppler-cpp pdf library☆108Updated 8 months ago
- ScanTailor Universal - a fork based on Enhanced+Featured+Master versions of ST☆213Updated 2 months ago
- Scan Tailor Experimental is an interactive post-processing tool for scanned pages.☆69Updated last week
- YOLOv11 trained on DocLayNet dataset.☆40Updated 7 months ago
- faster page_dewarp in C++☆32Updated 3 years ago
- Python library to extract tabular data from images and scanned PDFs☆278Updated 10 months ago
- A step-by-step C# implementation of the Docstrum algorithm☆23Updated 4 years ago
- Utilities for manipulating PostScript documents☆47Updated 3 weeks ago
- Main TWAIN Direct repository☆31Updated 2 years ago
- Download Poppler binaries packaged for Windows with dependencies☆815Updated 6 months ago