Data Mining Historical Newspaper Metadata (METS/ALTO formats)
☆25Feb 6, 2026Updated 3 weeks ago
Alternatives and similar repositories for EN-data_mining
Users that are interested in EN-data_mining are comparing it to the libraries listed below
Sorting:
- Conversions between various OCR formats☆83Feb 13, 2026Updated 2 weeks ago
- Awesome AI in Libraries☆17Jul 21, 2023Updated 2 years ago
- OCR-D post-correction module based on weighted finite-state transducers☆11Jan 13, 2024Updated 2 years ago
- ☆16Feb 23, 2015Updated 11 years ago
- OCRopus model for Gothic print (Fraktur)☆19Feb 16, 2020Updated 6 years ago
- 'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy.☆22Feb 21, 2018Updated 8 years ago
- Tentative way towards a shared API for prosopographical data based on the factoid model (Bradley/Short 2005)☆24Aug 25, 2022Updated 3 years ago
- ☆10Mar 16, 2023Updated 2 years ago
- Extract the MODS/ALTO metadata of a bunch of METS/ALTO files into pandas DataFrames for data analysis☆13Aug 21, 2025Updated 6 months ago
- Named Entity Recognition tool for Europeana Newspapers☆14Apr 5, 2018Updated 7 years ago
- TIFY is a slim and mobile-friendly IIIF document viewer.☆123Updated this week
- Guess a person's gender by their first name. Caveats apply.☆18May 6, 2023Updated 2 years ago
- Vue-based Web Component for creating narrative presentations of images and maps☆15May 1, 2025Updated 10 months ago
- Command Line Interface (CLI) to export METS/ALTO documents to other formats.☆13Apr 25, 2022Updated 3 years ago
- ☆14Jul 11, 2022Updated 3 years ago
- Elections data from the early American republic☆15May 30, 2019Updated 6 years ago
- METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)☆56May 30, 2023Updated 2 years ago
- Web service for creating and hosting IIIF manifests from METS/MODS documents☆36Dec 8, 2022Updated 3 years ago
- “Open terminals”, “load CSVs”, “start hacking”☆16May 2, 2017Updated 8 years ago
- A library for extracting structured data from museum provenance records.☆36Jan 13, 2018Updated 8 years ago
- Java based viewer for PAGE XML files (layout + text content). Also supports ALTO XML, FineReader XML, and HOCR.☆35May 25, 2023Updated 2 years ago
- OCR-D python tools☆33Aug 16, 2024Updated last year
- A simple IIIF and Mirador compatible Annotation Server☆100Dec 19, 2025Updated 2 months ago
- Special Topics in AI: Artificial Intelligence as an Archival Science☆20May 13, 2024Updated last year
- ARCHIVED Extract Text from 'PDFs'☆21May 10, 2022Updated 3 years ago
- Double-checked Gold Standard Data for Training and Testing OCR Engines☆21Dec 31, 2022Updated 3 years ago
- IIIF Examples and useful code☆20Sep 10, 2025Updated 5 months ago
- Web application for transcribing OCR ground truth from Archive.org☆17Feb 22, 2018Updated 8 years ago
- Collection of hand-analyzed ancient Greek prose in dependency trees.☆19Aug 15, 2022Updated 3 years ago
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆200May 21, 2025Updated 9 months ago
- Python tools for performing various operations on ALTO XML files☆49Feb 27, 2025Updated last year
- This is a stand-alone OAI-PMH data provider. It serves records in any metadata format from directories of XML files using the directory n…☆18Aug 27, 2025Updated 6 months ago
- OCR post correction for old German corpus☆19Aug 29, 2022Updated 3 years ago
- (ICFHR 2020 oral) Code for "docExtractor: An off-the-shelf historical document element extraction" paper☆88May 25, 2023Updated 2 years ago
- Process, enhance and evaluate multiple OCR output.☆24Dec 2, 2025Updated 2 months ago
- An extensible viewer for OCR-D mets.xml files☆22May 30, 2024Updated last year
- A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.☆195Updated this week
- DFKI Layout Detection for OCR-D☆47May 1, 2025Updated 10 months ago
- Text-Induced Corpus Clean-up☆20Jun 20, 2023Updated 2 years ago