mauvilsa/tesseract-recognize

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mauvilsa/tesseract-recognize)

mauvilsa / tesseract-recognize

Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format

☆47

Alternatives and similar repositories for tesseract-recognize

Users that are interested in tesseract-recognize are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mauvilsa / nw-page-editor
View on GitHub
Simple app for visual editing of Page XML files
☆31Sep 25, 2025Updated 10 months ago
seuretm / ocrd_typegroups_classifier
View on GitHub
☆10Mar 16, 2023Updated 3 years ago
jze / ocropus-model_fraktur
View on GitHub
OCRopus model for Gothic print (Fraktur)
☆19Feb 16, 2020Updated 6 years ago
ASVLeipzig / cor-asv-fst
View on GitHub
OCR-D post-correction module based on weighted finite-state transducers
☆11Jan 13, 2024Updated 2 years ago
Doreenruirui / okralact
View on GitHub
A repository for online OCRD training infrastructure.
☆13Aug 20, 2020Updated 5 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
kba / transkribus-to-prima
View on GitHub
Convert Transkribus PAGE-XML to standard PAGE-XML
☆12Dec 10, 2025Updated 7 months ago
OCR-D / ocrd_anybaseocr
View on GitHub
DFKI Layout Detection for OCR-D
☆47May 1, 2025Updated last year
CITlabRostock / citlab-article-separation-new
View on GitHub
Modules used for separating articles in (historical) newspapers and similar documents. This repository is part of the European Union's Ho…
☆22Sep 2, 2022Updated 3 years ago
OCR-D / page-to-alto
View on GitHub
Convert PAGE (v. 2019) to ALTO (v. 2.0 - 4.2)
☆17Jun 5, 2026Updated last month
PRImA-Research-Lab / prima-core-libs
View on GitHub
Core libraries by the PRImA Research Lab
☆16Jul 30, 2024Updated last year
jbaiter / archiscribe
View on GitHub
Web application for transcribing OCR ground truth from Archive.org
☆18Feb 22, 2018Updated 8 years ago
lquirosd / P2PaLA
View on GitHub
Page to PAGE Layout Analysis Tool
☆192Jan 17, 2022Updated 4 years ago
andbue / nashi
View on GitHub
Some bits of javascript to transcribe scanned pages using PageXML
☆17May 27, 2026Updated last month
zamazan4ik / PRLib
View on GitHub
Pre-Recognition Library - library with algorithms for improving OCR quality.
☆38Mar 20, 2021Updated 5 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
mittagessen / curt
View on GitHub
☆15Jul 11, 2022Updated 4 years ago
uniwue-zpd / PAGETools
View on GitHub
Small collection of PAGE XML related scripts used at the ZPD Würzburg
☆12Aug 2, 2024Updated last year
OCR-D / ocrd_all
View on GitHub
Master repository which includes most other OCR-D repositories as submodules
☆73Jul 4, 2025Updated last year
hnesk / browse-ocrd
View on GitHub
An extensible viewer for OCR-D mets.xml files
☆23May 30, 2024Updated 2 years ago
OpenPhilology / nidaba
View on GitHub
An expandable and scalable OCR pipeline
☆90Nov 14, 2017Updated 8 years ago
omni-us / pagexml
View on GitHub
Library in C++ and a python wrapper for dealing with Page XML files
☆13Apr 25, 2025Updated last year
ocropus-archive / DUP-ocropy2
View on GitHub
Next generation OCR engine based on LSTMs.
☆51Apr 8, 2018Updated 8 years ago
bertsky / ocrd_publaynet
View on GitHub
convert PubLayNet data into METS/PAGE-XML
☆10Mar 17, 2020Updated 6 years ago
UB-Mannheim / reichsanzeiger-nlp
View on GitHub
Reichsanzeiger-NLP: NER/NEL corpus for the German historical newspaper "Deutscher Reichsanzeiger und Preußischer Staatsanzeiger" (1819–19…
☆16Oct 18, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
UB-Mannheim / GTCheck
View on GitHub
Check your modified Ground Truth files with visual support!
☆10Jan 31, 2024Updated 2 years ago
qurator-spk / sbb_ner
View on GitHub
Named Entity Recognition
☆19Feb 13, 2026Updated 5 months ago
achimrabus / polyscriptor
View on GitHub
Multi-engine ATR for multiple languages and scripts
☆17Jul 10, 2026Updated 2 weeks ago
OCR-D / ocrd_pagetopdf
View on GitHub
OCR-D wrapper for prima-pagetopdf
☆10Oct 30, 2025Updated 8 months ago
qurator-spk / mods4pandas
View on GitHub
Extract the MODS/ALTO metadata of a bunch of METS/ALTO files into pandas DataFrames for data analysis
☆15Aug 21, 2025Updated 11 months ago
qurator-spk / dinglehopper
View on GitHub
An OCR evaluation tool
☆70Aug 22, 2025Updated 11 months ago
idhmc-tamu / eMOP
View on GitHub
files and code related to the Early Modern OCR Project (eMOP) at the IDHMC
☆16Oct 2, 2014Updated 11 years ago
OCR-D / core
View on GitHub
Collection of OCR-related python tools and wrappers from @OCR-D
☆135Updated this week
EuropeanaNewspapers / ner-app
View on GitHub
Named Entity Recognition tool for Europeana Newspapers
☆14Apr 5, 2018Updated 8 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
Early-Modern-OCR / hOCR-De-Noising
View on GitHub
code to remove "noise" from hOCR output of Tesseract OCR.
☆14Oct 24, 2016Updated 9 years ago
bibliocoll / JournalTouch
View on GitHub
JournalTouch provides a touch-optimized interface for browsing current journal tables of contents in Responsive Design. Fun!
☆14May 27, 2019Updated 7 years ago
cneud / ocr-conversion
View on GitHub
Conversions between various OCR formats
☆84Feb 13, 2026Updated 5 months ago
cisocrgroup / Resources
View on GitHub
Manuals, lexica, OCR test data for PoCoTo and the profiler
☆15Jul 2, 2021Updated 5 years ago
qurator-spk / sbb_textline_detection
View on GitHub
Detect textlines in document images
☆90May 27, 2024Updated 2 years ago
benedikt-budig / glyph-miner
View on GitHub
Glyph Miner, a system for extracting glyphs from early typeset prints
☆34Sep 29, 2016Updated 9 years ago
ryanfb / ancientgreekocr-ocr-evaluation-tools
View on GitHub
'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy.
☆23Feb 21, 2018Updated 8 years ago