Tools to process books in a cloud based pipeline system
☆66Dec 4, 2025Updated 3 months ago
Alternatives and similar repositories for bookpipeline
Users that are interested in bookpipeline are comparing it to the libraries listed below
Sorting:
- A VUE IIIF viewer☆14Dec 14, 2025Updated 3 months ago
- Self hosting code for Recogito-Studio☆21Updated this week
- Selected code and data for The Online Books Page and related applications☆11Mar 1, 2026Updated 3 weeks ago
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆201May 21, 2025Updated 10 months ago
- Command Line Interface for running 🤗 Transformers Image Classification locally☆19May 8, 2025Updated 10 months ago
- Tools for working with CSV files☆17Sep 19, 2012Updated 13 years ago
- File detector, metadata collector and well-formedness checker tool☆18Feb 3, 2026Updated last month
- Documentation and use cases for ALTO XML☆42Sep 10, 2018Updated 7 years ago
- Convert PAGE (v. 2019) to ALTO (v. 2.0 - 4.2)☆15Jan 20, 2026Updated 2 months ago
- Pre-Ingest Tool for creating submission information packages☆22Sep 13, 2024Updated last year
- Given the URL to a public JSON document in an International Image Interoperability Framework (IIIF) image server, this script will downlo…☆17Sep 6, 2022Updated 3 years ago
- ALTO XML schema - latest and all former versions☆55Jan 20, 2026Updated 2 months ago
- Make a searchable pdf via Google Cloud Vision OCR☆14Jan 17, 2020Updated 6 years ago
- Reading mdict files, support MDX/MDD file formats.☆18Feb 3, 2026Updated last month
- Convert between Tesseract hOCR and ALTO XML using XSL stylesheets☆59Sep 25, 2025Updated 5 months ago
- Decodes Compact Disc data from microscope images of a CD's surface☆12Jan 14, 2023Updated 3 years ago
- Goobi viewer - Presentation software for digital libraries, museums, archives and galleries. Open Source.☆25Updated this week
- Digitization information system build on top of Fedora repository☆16Jan 15, 2019Updated 7 years ago
- Docker setup for OCR4all bundled with Larex☆22Jan 29, 2024Updated 2 years ago
- IIIF experiments with Gallica content☆31Nov 16, 2025Updated 4 months ago
- Umbrella repository that describes the collections contained in any given release of ELTeC☆13Jan 26, 2022Updated 4 years ago
- Python tool for batch visual question answering (BVQA).☆14Sep 18, 2025Updated 6 months ago
- Transkriptionen von Fibeln (19. Jahrhundert)☆11Oct 31, 2025Updated 4 months ago
- An open-source, browser-based front-end application for the collection of complex structured data from textual resources in history and t…☆16Updated this week
- Share a view of a IIIF document with a short link☆13May 2, 2024Updated last year
- ☆18Oct 9, 2018Updated 7 years ago
- a Mirador 3 plugin that adds annotation creation tools to the user interface☆43Feb 12, 2026Updated last month
- A IIIF static tile and manifest generator built using Python to generate IIIF tiled images and manifests. This application was put toget…☆10Mar 2, 2026Updated 2 weeks ago
- Generate a IIIF manifest for a Wikipedia entry☆10Jun 7, 2018Updated 7 years ago
- An Unofficial, Fanmade Build Creator/Planner for Cyberpunk 2077☆12Mar 15, 2024Updated 2 years ago
- Check your modified Ground Truth files with visual support!☆10Jan 31, 2024Updated 2 years ago
- Builds a Simple Archive Format package from files and a spreadsheet☆47Apr 27, 2023Updated 2 years ago
- ☆10Aug 5, 2019Updated 6 years ago
- A Flask web app that integrates Tesseract OCR to extract text from image files.☆10May 14, 2023Updated 2 years ago
- Image comparison QA tool for digital preservation workflows.☆14Nov 17, 2014Updated 11 years ago
- Leveraging LLMs for Post-OCR Correction of Historical Newspapers☆15Jun 20, 2024Updated last year
- ☆10Updated this week
- A basic editor for samvera objects.☆10Feb 4, 2026Updated last month
- Master repository which includes most other OCR-D repositories as submodules☆72Jul 4, 2025Updated 8 months ago