CITlabRostock / citlab-article-separation-newView external linksLinks
Modules used for separating articles in (historical) newspapers and similar documents. This repository is part of the European Union's Horizon 2020 project NewsEye. For more information about the project see https://www.newseye.eu/.
☆22Sep 2, 2022Updated 3 years ago
Alternatives and similar repositories for citlab-article-separation-new
Users that are interested in citlab-article-separation-new are comparing it to the libraries listed below
Sorting:
- OCRopus model for Gothic print (Fraktur)☆19Feb 16, 2020Updated 5 years ago
- Training data from "Hauptphase I" of project "Digitalisierung historischer deutscher Zeitungen"☆12Dec 17, 2021Updated 4 years ago
- Convert PAGE (v. 2019) to ALTO (v. 2.0 - 4.2)☆14Jan 20, 2026Updated 3 weeks ago
- Convert Transkribus PAGE-XML to standard PAGE-XML☆12Dec 10, 2025Updated 2 months ago
- convert PubLayNet data into METS/PAGE-XML☆10Mar 17, 2020Updated 5 years ago
- An extensible viewer for OCR-D mets.xml files☆22May 30, 2024Updated last year
- Layout analysis to find layout elements in documents (similar to P2PaLA)☆20Jan 7, 2026Updated last month
- Reichsanzeiger-NLP: NER/NEL corpus for the German historical newspaper "Deutscher Reichsanzeiger und Preußischer Staatsanzeiger" (1819–19…☆16Oct 18, 2024Updated last year
- Specification of the @OCR-D technical architecture, interface definitions and data exchange format(s)☆17Sep 18, 2025Updated 4 months ago
- A repository for online OCRD training infrastructure.☆13Aug 20, 2020Updated 5 years ago
- ☆14Jul 11, 2022Updated 3 years ago
- Some bits of javascript to transcribe scanned pages using PageXML☆17Mar 18, 2024Updated last year
- OCR-D post-correction module based on weighted finite-state transducers☆11Jan 13, 2024Updated 2 years ago
- OCR-D wrapper for detectron2 based segmentation models☆17May 1, 2025Updated 9 months ago
- Master repository which includes most other OCR-D repositories as submodules☆72Jul 4, 2025Updated 7 months ago
- A Pythonic API and some command line tools to access the Transkribus server via its REST API☆28Nov 25, 2022Updated 3 years ago
- Web application for transcribing OCR ground truth from Archive.org☆17Feb 22, 2018Updated 7 years ago
- Recognize text using Calamari OCR and the OCR-D framework☆15May 13, 2025Updated 9 months ago
- Double-checked Gold Standard Data for Training and Testing OCR Engines☆21Dec 31, 2022Updated 3 years ago
- ☆66Feb 3, 2026Updated last week
- Transkriptionen von Fibeln (19. Jahrhundert)☆11Oct 31, 2025Updated 3 months ago
- OCR-D python tools☆33Aug 16, 2024Updated last year
- A CLI tool that generates IIIF Presentation 2.1 Manifests from METS/MODS☆24Apr 17, 2025Updated 9 months ago
- An OCR evaluation tool☆69Aug 22, 2025Updated 5 months ago
- ALTO XML schema - latest and all former versions☆55Jan 20, 2026Updated 3 weeks ago
- Check your modified Ground Truth files with visual support!☆10Jan 31, 2024Updated 2 years ago
- JournalTouch provides a touch-optimized interface for browsing current journal tables of contents in Responsive Design. Fun!☆14May 27, 2019Updated 6 years ago
- Tools for TICCL☆14Dec 12, 2025Updated 2 months ago
- Small collection of PAGE XML related scripts used at the ZPD Würzburg☆12Aug 2, 2024Updated last year
- API wrapper enabling Wikisources to submit images for optical character recognition.☆14Feb 5, 2026Updated last week
- texrex web page cleaning & ClaraX random walk crawler☆11Dec 13, 2021Updated 4 years ago
- OCR-D compliant toolset for optical layout recognition on historical german-language documents published in Brazil☆11Sep 24, 2021Updated 4 years ago
- ☆10Mar 16, 2023Updated 2 years ago
- Extract the MODS/ALTO metadata of a bunch of METS/ALTO files into pandas DataFrames for data analysis☆13Aug 21, 2025Updated 5 months ago
- DFKI Layout Detection for OCR-D☆47May 1, 2025Updated 9 months ago
- Command line tool to convert page layout files to the latest PAGE XML format. It supports all previous versions of the PAGE format as wel…☆24Jan 30, 2021Updated 5 years ago
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Oct 24, 2016Updated 9 years ago
- Named Entity Recognition tool for Europeana Newspapers☆14Apr 5, 2018Updated 7 years ago
- OCR-D post-correction with encoder-attention-decoder LSTMs☆13May 1, 2025Updated 9 months ago