CopenhagenCityArchives / CorrectOCR
Machine Learning-assisted correction of OCR errors in historical corpora
☆9Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for CorrectOCR
- Run tesseract with the tesserocr bindings with @OCR-D's interfaces☆39Updated 2 months ago
- Wrapper around pixel classifier☆9Updated 2 years ago
- Master repository which includes most other OCR-D repositories as submodules☆72Updated 3 weeks ago
- ☆10Updated 5 years ago
- Python tools for Tesseract OCR training☆25Updated 2 years ago
- METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)☆52Updated last year
- A set of workflows for corpus building through OCR, post-correction and normalisation☆48Updated 2 years ago
- PAGE XML format collection for document image page content and more☆66Updated 3 years ago
- In-browser OCR of Ancient Greek and Latin☆23Updated last week
- Using Conditional Random Fields for segmenting Latin words written in scriptio continua☆10Updated 6 years ago
- Python 3 library for processing historical English☆64Updated 3 months ago
- OCR-D python tools☆33Updated 2 months ago
- OCR-D post-correction module based on weighted finite-state transducers☆11Updated 9 months ago
- An OCR evaluation tool☆64Updated last month
- Ergonomic line-by-line transcription of scanned text.☆47Updated 3 years ago
- Tutorial on NE processing for Digital Humanities - DH Utrech 2019☆25Updated 5 years ago
- A suite of batches and tools for OCR tasks.☆71Updated last year
- Pretrained mixed models to be used with Calamari.☆58Updated last month
- LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilatio…☆68Updated last year
- Specification of the @OCR-D technical architecture, interface definitions and data exchange format(s)☆17Updated 2 months ago
- tesseractXplore a tesseract ease of use gui with full control☆21Updated 3 years ago
- ☆20Updated 5 years ago
- Conversions between various OCR formats☆71Updated last year
- 🚀GUI for training spaCy models☆53Updated 3 years ago
- OCR-D post-correction with encoder-attention-decoder LSTMs☆13Updated last month
- Source code for the paper "Post-OCR Document Correction with Large Ensembles of Character Sequence-to-Sequence Models"☆35Updated 11 months ago
- An implementation of Tiling and Corruption (TACo) Augmentations for OCR/HTR☆15Updated 2 years ago
- An efficient data structure for fast string similarity searches☆23Updated 3 years ago
- BERT and ELECTRA models trained on Europeana Newspapers☆36Updated 2 years ago
- Code and data for the paper at http://arxiv.org/abs/2004.07317☆16Updated 3 years ago