internetarchive / archive-pdf-toolsLinks
Fast PDF generation and compression. Deals with millions of pages daily.
☆117Updated 10 months ago
Alternatives and similar repositories for archive-pdf-tools
Users that are interested in archive-pdf-tools are comparing it to the libraries listed below
Sorting:
- Efficient hOCR tooling☆44Updated last month
- ScanTailor Universal - a fork based on Enhanced+Featured+Master versions of ST☆214Updated 2 months ago
- ScanTailor Advanced is the version that merges the features of the ScanTailor Featured and ScanTailor Enhanced versions, brings new ones …☆242Updated this week
- Building scantailor and its dependencies☆58Updated last year
- Scan Tailor Experimental is an interactive post-processing tool for scanned pages.☆73Updated this week
- Automatic de-keystoning for single camera DIY book scanners.☆49Updated 4 years ago
- A post-processing tool for scanned sheets of paper.☆82Updated last year
- smoothscan is a tool to convert scanned text into a vectorized output form.☆67Updated 11 years ago
- Tools to process books in a cloud based pipeline system☆61Updated 2 months ago
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆188Updated last month
- Specifications developed and maintained by the Webrecorder community.☆131Updated 5 months ago
- Industry-based resolutions for issues and errata reported against any PDF-related specification☆73Updated this week
- Master repository which includes most other OCR-D repositories as submodules☆73Updated last month
- Ergonomic line-by-line transcription of scanned text.☆52Updated 4 years ago
- ☆46Updated last year
- Conversions between various OCR formats☆78Updated 2 years ago
- The hOCR Embedded OCR Workflow and Output Format☆73Updated 10 months ago
- search interface for scholarly works☆85Updated 10 months ago
- Format Identification for Digital Objects (FIDO) is a Python command-line tool to identify the file formats of digital objects. It is des…☆157Updated 3 months ago
- DV analyzer☆11Updated 6 months ago
- Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an ou…☆179Updated 2 weeks ago
- Modular workflow assistant for book digitization☆126Updated 9 years ago
- guides and test data for OCR4all☆31Updated 2 years ago
- File validation and characterisation.☆180Updated last month
- Create Robust Links from within Zotero☆20Updated 3 years ago
- Collection of resources, papers, blog posts, and other documentation around working on and with Archivematica.☆21Updated last year
- Convert Directories, Files and ZIP Files to Web Archives (WARC)☆85Updated 2 months ago
- A vendor- and implementation-independent specification-derived, machine-readable model of PDF.☆85Updated 2 weeks ago
- Converts WARC files to static HTML☆44Updated 11 months ago
- Experimental proxy and wrapper for safely embedding Web Archives (warc, warc.gz, wacz) into web pages.☆32Updated last month