ocropus/hocr-tools

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ocropus/hocr-tools)

ocropus / hocr-tools

Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.

☆416

Alternatives and similar repositories for hocr-tools

Users that are interested in hocr-tools are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

dinosauria123 / gcv2hocr
View on GitHub
gcv2hocr converts from Google Cloud Vision OCR output to hocr to make a searchable pdf.
☆108Oct 22, 2020Updated 5 years ago
UB-Mannheim / ocr-fileformat
View on GitHub
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
☆204May 21, 2025Updated last year
kba / hocr-spec
View on GitHub
The hOCR Embedded OCR Workflow and Output Format
☆74Aug 12, 2024Updated last year
kba / hocrjs
View on GitHub
Working with hOCR in Javascript
☆134Mar 4, 2023Updated 3 years ago
cneud / ocr-conversion
View on GitHub
Conversions between various OCR formats
☆84Feb 13, 2026Updated 5 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
filak / hOCR-to-ALTO
View on GitHub
Convert between Tesseract hOCR and ALTO XML using XSL stylesheets
☆60Mar 20, 2026Updated 4 months ago
jbaiter / hocrviewer-mirador
View on GitHub
View HOCR files with Mirador
☆30Sep 27, 2017Updated 8 years ago
ultrasaurus / hocr-javascript
View on GitHub
JS for overlaying OCR on image using HOCR formatted HTML
☆26Jul 30, 2016Updated 9 years ago
not-implemented / hocr-proofreader
View on GitHub
Web based JavaScript GUI library for proofreading/editing hOCR
☆102Sep 17, 2018Updated 7 years ago
altoxml / documentation
View on GitHub
Documentation and use cases for ALTO XML
☆42Sep 10, 2018Updated 7 years ago
athento / hocr-parser
View on GitHub
HOCR Specification Python Parser
☆12Sep 23, 2015Updated 10 years ago
cneud / alto-tools
View on GitHub
Python tools for performing various operations on ALTO XML files
☆50Jun 12, 2026Updated last month
UB-Mannheim / ocr-gt-tools
View on GitHub
Ergonomic line-by-line transcription of scanned text.
☆53Feb 2, 2026Updated 5 months ago
jbrinley / HocrConverter
View on GitHub
Create PDFs and plain text from hOCR documents
☆36Jun 11, 2010Updated 16 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
tmbarchive / ocropus3-docker
View on GitHub
Docker container for ocropus3 OCR system
☆12Aug 19, 2018Updated 7 years ago
mittagessen / kraken
View on GitHub
OCR engine for all the languages
☆1,037Jul 16, 2026Updated last week
tokee / quack
View on GitHub
QA-tool for scans with corresponding ALTO-files
☆27Dec 2, 2022Updated 3 years ago
ryanfb / ancientgreekocr-ocr-evaluation-tools
View on GitHub
'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy.
☆23Feb 21, 2018Updated 8 years ago
OCR4all / LAREX
View on GitHub
A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.
☆198Updated this week
PRImA-Research-Lab / prima-page-viewer
View on GitHub
Java based viewer for PAGE XML files (layout + text content). Also supports ALTO XML, FineReader XML, and HOCR.
☆36May 25, 2023Updated 3 years ago
pharos-alexandria / ocr-greek_cursive
View on GitHub
Training files for Greek cursive script (in early print)
☆15May 26, 2021Updated 5 years ago
ocropus-archive / DUP-ocropy2
View on GitHub
Next generation OCR engine based on LSTMs.
☆51Apr 8, 2018Updated 8 years ago
cisocrgroup / PoCoTo
View on GitHub
The CIS OCR PostCorrectionTool
☆45Nov 7, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
idhmc-tamu / eMOP
View on GitHub
files and code related to the Early Modern OCR Project (eMOP) at the IDHMC
☆16Oct 2, 2014Updated 11 years ago
mbennett-uoe / whiiif
View on GitHub
Simple IIIF Search service for OCRed texts
☆17Dec 16, 2020Updated 5 years ago
ocropus-archive / DUP-ocropy
View on GitHub
Python-based tools for document analysis and OCR
☆3,466May 22, 2021Updated 5 years ago
altoxml / schema
View on GitHub
ALTO XML schema - latest and all former versions
☆55Jul 8, 2026Updated 2 weeks ago
PublicI / pdf-gcv-ocr
View on GitHub
Tool to OCR PDFs using Google Cloud Vision
☆42Dec 7, 2022Updated 3 years ago
PRImA-Research-Lab / prima-page-converter
View on GitHub
Command line tool to convert page layout files to the latest PAGE XML format. It supports all previous versions of the PAGE format as wel…
☆25Jan 30, 2021Updated 5 years ago
iiif-prezi / iiif-prezi
View on GitHub
IIIF Presentation API implementation in Python
☆35Apr 17, 2024Updated 2 years ago
jze / ocropus-model_fraktur
View on GitHub
OCRopus model for Gothic print (Fraktur)
☆19Feb 16, 2020Updated 6 years ago
Calamari-OCR / calamari
View on GitHub
Line based ATR Engine based on OCRopy
☆1,197Jun 23, 2026Updated last month
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
hnesk / browse-ocrd
View on GitHub
An extensible viewer for OCR-D mets.xml files
☆23May 30, 2024Updated 2 years ago
PRImA-Research-Lab / prima-page-to-pdf
View on GitHub
Java command line tool to convert PAGE XML files with layout and text content to PDF
☆10Apr 27, 2020Updated 6 years ago
OCR-D / ocrd_all
View on GitHub
Master repository which includes most other OCR-D repositories as submodules
☆73Jul 4, 2025Updated last year
tmbarchive / docker-ocropus
View on GitHub
A small Docker built for the OCRopus OCR system.
☆19Dec 16, 2017Updated 8 years ago
europeana / media-player
View on GitHub
Media player developed under the Europeana Media Generic Services Project
☆13Sep 7, 2023Updated 2 years ago
OCR-D / page-to-alto
View on GitHub
Convert PAGE (v. 2019) to ALTO (v. 2.0 - 4.2)
☆17Jun 5, 2026Updated last month
tberg12 / ocular
View on GitHub
Ocular is a state-of-the-art historical OCR system.
☆270Jun 7, 2024Updated 2 years ago