ocropus / ocropus4-evalLinks

Tools for evaluating OCR performance relative to ground truth.

☆10

Alternatives and similar repositories for ocropus4-eval

Users that are interested in ocropus4-eval are comparing it to the libraries listed below

Sorting:

Pleias / marginalia
☆67Updated last year
pd3f / pd3f-core
📑 Python Package to reconstruct the original continuous text from PDFs with language models
☆32Updated last year
OCR-D / ocrd_tesserocr
Run tesseract with the tesserocr bindings with @OCR-D's interfaces
☆39Updated 2 months ago
bitextor / warc2text
Extracts plain text, language identification and more metadata from WARC records
☆23Updated 4 months ago
marieai / marie-ai
Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…
☆70Updated this week
UB-Mannheim / ocr-gt-tools
Ergonomic line-by-line transcription of scanned text.
☆53Updated 4 years ago
Pleias / OCRoscope
Small python package to measure OCR quality and other related metrics.
☆25Updated last year
OCR-D / ocrd_anybaseocr
DFKI Layout Detection for OCR-D
☆47Updated 2 months ago
ocropus-archive / ocropus4-old
☆27Updated last year
dcthree / antigrapheus
In-browser OCR of Ancient Greek and Latin
☆26Updated 2 months ago
ASVLeipzig / cor-asv-fst
OCR-D post-correction module based on weighted finite-state transducers
☆11Updated last year
stanfordnlp / pdf-struct
Logical structure analysis for visually structured documents
☆91Updated 2 years ago
OCR-D / spec
Specification of the @OCR-D technical architecture, interface definitions and data exchange format(s)
☆17Updated 2 months ago
natliblux / nautilusocr
METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)
☆54Updated 2 years ago
benedikt-budig / glyph-miner
Glyph Miner, a system for extracting glyphs from early typeset prints
☆34Updated 8 years ago
weaviate / biggraph-wikidata-search-with-weaviate
Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engine
☆31Updated 3 years ago
wjbmattingly / LeetTopic
☆55Updated last year
usnistgov / ocr-pipeline
Convert a corpus of PDF to clean text files on a distributed architecture
☆39Updated last year
ryanfb / ancientgreekocr-ocr-evaluation-tools
'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy.
☆22Updated 7 years ago
DEFI-COLaF / LADaS
Layout Analysis Dataset with Segmonto (LADaS)
☆21Updated last week
dataiku / dss-plugin-nlp-preparation
Dataiku DSS plugin to detect languages, correct misspellings, and clean text data 🧼
☆22Updated 6 months ago
acoli-repo / olia
Ontologies of Linguistic Annotation. Machine-readable tagsets and annotation schemata for more than 100 languages.
☆20Updated 2 months ago
wjbmattingly / bagpipes-spacy
Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities.
☆18Updated 11 months ago
mauvilsa / tesseract-recognize
Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format
☆46Updated 3 months ago
qurator-spk / dinglehopper
An OCR evaluation tool
☆66Updated 2 months ago
poke1024 / origami
A suite of batches and tools for OCR tasks.
☆71Updated 2 years ago
papercast-dev / papercast
A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…
☆51Updated 4 months ago
alea-institute / nupunkt
Next-generation Punkt sentence boundary detection with zero dependencies
☆17Updated 3 months ago
dbmdz / clef-hipe
Code and models for our CLEF-HIPE (Named Entity Processing on Historical Newspapers) submissions
☆19Updated 2 years ago
berkmancenter / corpusbuilder
Corpus Build OCR platform
☆8Updated 2 years ago