Convert ALTO XML to plain text + minimal metadata
☆17Oct 17, 2024Updated last year
Alternatives and similar repositories for alto2txt
Users that are interested in alto2txt are comparing it to the libraries listed below
Sorting:
- Python tools for performing various operations on ALTO XML files☆49Feb 27, 2025Updated last year
- ☆17Feb 27, 2026Updated last week
- Locolligo is a single-page, browser-based javascript application to facilitate the formatting, linking, and geolocation of datasets, with…☆15Feb 19, 2024Updated 2 years ago
- The GeoNewsMiner (GNM): An interactive spatial humanities tool to visualize geographical references in historical newspapers☆18Feb 21, 2022Updated 4 years ago
- Efficient hOCR tooling☆55Aug 18, 2025Updated 6 months ago
- Text conversion tool (from e.g. Word, HTML, txt) to corpus formats TEI or FoLiA)☆23Feb 11, 2022Updated 4 years ago
- A CLI tool that generates IIIF Presentation 2.1 Manifests from METS/MODS☆24Apr 17, 2025Updated 10 months ago
- QA-tool for scans with corresponding ALTO-files☆26Dec 2, 2022Updated 3 years ago
- OAI-PMH 2.0 harvester module for nodejs☆23Dec 30, 2022Updated 3 years ago
- View HOCR files with Mirador☆29Sep 27, 2017Updated 8 years ago
- The code used to evaluate embedding models on the Massive Legal Embedding Benchmark (MLEB).☆31Feb 24, 2026Updated 2 weeks ago
- Scripts to create git repositories for ALTO XML texts, like those from the British Library's scanned documents.☆31Nov 3, 2017Updated 8 years ago
- Convert your local XML file into a HTML table, Export XML as CSV/JSON and Visualize XML in 2D/3D force directed d3 graphs☆10Aug 24, 2022Updated 3 years ago
- jpdfbookmarks - fix JPdfBookmarks GUI mode open a pdf have bookmarks include CJK (Chinese , Japanese , Korean ) characters will show like…☆11Sep 4, 2023Updated 2 years ago
- A Python-based voice assistant integrating speech-to-text (STT), text-to-speech (TTS), and powerful AI capabilities using either a local …☆17Dec 8, 2025Updated 3 months ago
- European Parliament website Python scraper☆12Oct 19, 2016Updated 9 years ago
- Web service for creating and hosting IIIF manifests from METS/MODS documents☆36Dec 8, 2022Updated 3 years ago
- A metadata record management system written in PHP, intended to be used in conjunction with VuFind or another Solr-based discovery interf…☆50Mar 2, 2026Updated last week
- Documentation and use cases for ALTO XML☆42Sep 10, 2018Updated 7 years ago
- Conversions between various OCR formats☆84Feb 13, 2026Updated 3 weeks ago
- OCR-D python tools☆33Aug 16, 2024Updated last year
- A collection of OCR'd and machine-corrected Greek texts. This base repository contains Git submodules for the different works and an inve…☆11Nov 18, 2014Updated 11 years ago
- Gimp plugins to extract text from images (Bubble/Balloons)☆12Jul 7, 2024Updated last year
- Faster access to Tesseract-OCR from Python☆13Jun 8, 2021Updated 4 years ago
- An Alfred workflow--and a command line utility--to easily find recently modified files.☆13Sep 15, 2023Updated 2 years ago
- Convert Transkribus PAGE-XML to standard PAGE-XML☆12Dec 10, 2025Updated 3 months ago
- Project to digitize avant-garde periodicals☆12May 13, 2022Updated 3 years ago
- Repository for deepdoctection tutorial notebooks☆52Jan 1, 2026Updated 2 months ago
- A package manager built for the command-line JSON processor jq.☆45Jun 1, 2021Updated 4 years ago
- Small collection of PAGE XML related scripts used at the ZPD Würzburg☆12Aug 2, 2024Updated last year
- Grav plugin for fetching and displaying Wordpress posts. http://getgrav.org☆10Sep 29, 2017Updated 8 years ago
- a little nodejs server and script that extracts letters from images via tesseract☆19Mar 4, 2015Updated 11 years ago
- Tool that significantly reduces the size of your Clone Hero songs library (by approximately 45%).☆10Dec 3, 2024Updated last year
- Drag-and-drop to find text. A work in progress.☆14Oct 6, 2022Updated 3 years ago
- Format specifiers to use with json-schema-validator☆12Jul 1, 2013Updated 12 years ago
- A Python helper library to convert between ISO 639 two- and three-letter codes.☆11Nov 13, 2024Updated last year
- ☆10Jan 7, 2025Updated last year
- MacOS Javascript JavaScript for Automation (JXA) bundler. Creates MacOS Apps, Commandline Scripts. Allows to use libaries from NPM.☆10Jun 8, 2022Updated 3 years ago
- convert PubLayNet data into METS/PAGE-XML☆10Mar 17, 2020Updated 5 years ago