CITlabRostock / citlab-article-separation-new
Modules used for separating articles in (historical) newspapers and similar documents. This repository is part of the European Union's Horizon 2020 project NewsEye. For more information about the project see https://www.newseye.eu/.
☆18Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for citlab-article-separation-new
- ☆11Updated 2 years ago
- Training data from "Hauptphase I" of project "Digitalisierung historischer deutscher Zeitungen"☆12Updated 2 years ago
- You Actually Look Twice At it☆29Updated last month
- Layout analysis to find layout elements in documents (similar to P2PaLA)☆17Updated this week
- Convert PAGE (v. 2019) to ALTO (v. 2.0 - 4.2)☆13Updated last month
- OCRopus model for Gothic print (Fraktur)☆18Updated 4 years ago
- Some bits of javascript to transcribe scanned pages using PageXML☆17Updated 8 months ago
- Named entity annotation tool☆27Updated last year
- An extensible viewer for OCR-D mets.xml files☆20Updated 5 months ago
- Reichsanzeiger-NLP: NER/NEL corpus for the German historical newspaper "Deutscher Reichsanzeiger und Preußischer Staatsanzeiger" (1819–19…☆14Updated last month
- Named Entity Recognition☆16Updated last week
- OCR-D wrapper for prima-pagetopdf☆8Updated 3 weeks ago
- Transkriptionen von Fibeln (19. Jahrhundert)☆11Updated 9 months ago
- Python tools for performing various operations on ALTO XML files☆39Updated last year
- Conversions between various OCR formats☆71Updated last year
- Docker integration of Kitodo.Production and OCR-D☆9Updated 8 months ago
- ☆50Updated this week
- CERberus -- guardian against character errors☆26Updated 9 months ago
- Check your modified Ground Truth files with visual support!☆10Updated 9 months ago
- A Pythonic API and some command line tools to access the Transkribus server via its REST API☆27Updated last year
- Named Entity Recognition tool for Europeana Newspapers☆14Updated 6 years ago
- METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)☆52Updated last year
- ☆26Updated 3 months ago
- Tutorial on NE processing for Digital Humanities - DH Utrech 2019☆25Updated 5 years ago
- DTA Base Format (DTABf)☆17Updated 2 months ago
- Augment line images for improving OCR datasets☆9Updated last year
- Java command line tool to convert PAGE XML files with layout and text content to PDF☆10Updated 4 years ago
- OCR-D post-correction module based on weighted finite-state transducers☆11Updated 10 months ago
- Extract the MODS/ALTO metadata of a bunch of METS/ALTO files into pandas DataFrames for data analysis☆11Updated 3 months ago
- Pipeline for the production of digital scholarly editions of archival collections☆11Updated 9 months ago