CopenhagenCityArchives / CorrectOCR
Machine Learning-assisted correction of OCR errors in historical corpora
☆9Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for CorrectOCR
- ☆10Updated 5 years ago
- Using Conditional Random Fields for segmenting Latin words written in scriptio continua☆10Updated 6 years ago
- Given a text, wrap it into phrases and send them to Yandex's search engine. If it yields a "did you mean:", substitute the original phras…☆11Updated 5 years ago
- Python 3 library for processing historical English☆64Updated 3 months ago
- In-browser OCR of Ancient Greek and Latin☆23Updated 3 weeks ago
- Run tesseract with the tesserocr bindings with @OCR-D's interfaces☆39Updated 3 months ago
- Wrapper around pixel classifier☆9Updated 2 years ago
- Master repository which includes most other OCR-D repositories as submodules☆72Updated last month
- An OCR evaluation tool☆64Updated last month
- PAGE XML format collection for document image page content and more☆66Updated 3 years ago
- Python based Wikidata framework for easy dataframe extraction☆39Updated 11 months ago
- Tutorial on NE processing for Digital Humanities - DH Utrech 2019☆25Updated 5 years ago
- Supplementary code for "Name2Vec: Personal Names Embeddings" presented at The Canadian Conference on AI 2019.☆18Updated 4 years ago
- tesseractXplore a tesseract ease of use gui with full control☆21Updated 3 years ago
- METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)☆52Updated last year
- Search for images on Steam using natural language queries.☆11Updated 3 years ago
- This repository contains code and data download instructions for the workshop paper "Improving Hierarchical Product Classification using …☆17Updated 3 years ago
- ☆50Updated this week
- A suite of batches and tools for OCR tasks.☆71Updated last year
- OCR-D-compliant page segmentation☆67Updated 2 months ago
- NewsEye / READ OCR training dataset from Austrian Newspapers (1864–1911)☆15Updated 9 months ago
- Extract tabular information from scanned documents (PDF to CSV)☆13Updated 4 years ago
- Code examples for Google Natural Language API.☆13Updated 5 years ago
- ☆15Updated 3 years ago
- OCR-D post-correction module based on weighted finite-state transducers☆11Updated 10 months ago
- Specification of the @OCR-D technical architecture, interface definitions and data exchange format(s)☆17Updated 3 months ago
- A web app built with Streamlit that summarizes input text☆13Updated 3 years ago
- A set of workflows for corpus building through OCR, post-correction and normalisation☆48Updated 2 years ago
- Repository hosting the common code for the entity-fishing clients☆9Updated 6 months ago
- Next generation OCR engine based on LSTMs.☆52Updated 6 years ago