ocropus / ocropus4-evalLinks
Tools for evaluating OCR performance relative to ground truth.
β10Updated last year
Alternatives and similar repositories for ocropus4-eval
Users that are interested in ocropus4-eval are comparing it to the libraries listed below
Sorting:
- β67Updated last year
- π Python Package to reconstruct the original continuous text from PDFs with language modelsβ32Updated last year
- Run tesseract with the tesserocr bindings with @OCR-D's interfacesβ39Updated 2 months ago
- Extracts plain text, language identification and more metadata from WARC recordsβ23Updated 4 months ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipβ¦β70Updated this week
- Ergonomic line-by-line transcription of scanned text.β53Updated 4 years ago
- Small python package to measure OCR quality and other related metrics.β25Updated last year
- DFKI Layout Detection for OCR-Dβ47Updated 2 months ago
- β27Updated last year
- In-browser OCR of Ancient Greek and Latinβ26Updated 2 months ago
- OCR-D post-correction module based on weighted finite-state transducersβ11Updated last year
- Logical structure analysis for visually structured documentsβ91Updated 2 years ago
- Specification of the @OCR-D technical architecture, interface definitions and data exchange format(s)β17Updated 2 months ago
- METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)β54Updated 2 years ago
- Glyph Miner, a system for extracting glyphs from early typeset printsβ34Updated 8 years ago
- Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engineβ31Updated 3 years ago
- β55Updated last year
- Convert a corpus of PDF to clean text files on a distributed architectureβ39Updated last year
- 'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy.β22Updated 7 years ago
- Layout Analysis Dataset with Segmonto (LADaS)β21Updated last week
- Dataiku DSS plugin to detect languages, correct misspellings, and clean text data π§Όβ22Updated 6 months ago
- Ontologies of Linguistic Annotation. Machine-readable tagsets and annotation schemata for more than 100 languages.β20Updated 2 months ago
- Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities.β18Updated 11 months ago
- Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML formatβ46Updated 3 months ago
- An OCR evaluation toolβ66Updated 2 months ago
- A suite of batches and tools for OCR tasks.β71Updated 2 years ago
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROβ¦β51Updated 4 months ago
- Next-generation Punkt sentence boundary detection with zero dependenciesβ17Updated 3 months ago
- Code and models for our CLEF-HIPE (Named Entity Processing on Historical Newspapers) submissionsβ19Updated 2 years ago
- Corpus Build OCR platformβ8Updated 2 years ago