dinosauria123/gcv2hocr

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/dinosauria123/gcv2hocr)

dinosauria123 / gcv2hocr

gcv2hocr converts from Google Cloud Vision OCR output to hocr to make a searchable pdf.

☆108

Alternatives and similar repositories for gcv2hocr

Users that are interested in gcv2hocr are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

PublicI / pdf-gcv-ocr
View on GitHub
Tool to OCR PDFs using Google Cloud Vision
☆42Dec 7, 2022Updated 3 years ago
ocropus / hocr-tools
View on GitHub
Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.
☆416Aug 10, 2024Updated last year
dinosauria123 / makepdf
View on GitHub
Make a searchable pdf via Google Cloud Vision OCR
☆14Jan 17, 2020Updated 6 years ago
UB-Mannheim / ocr-fileformat
View on GitHub
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
☆204May 21, 2025Updated last year
cneud / ocr-conversion
View on GitHub
Conversions between various OCR formats
☆84Feb 13, 2026Updated 5 months ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
filak / hOCR-to-ALTO
View on GitHub
Convert between Tesseract hOCR and ALTO XML using XSL stylesheets
☆60Mar 20, 2026Updated 4 months ago
ultrasaurus / hocr-javascript
View on GitHub
JS for overlaying OCR on image using HOCR formatted HTML
☆26Jul 30, 2016Updated 9 years ago
seuretm / ocrd_typegroups_classifier
View on GitHub
☆10Mar 16, 2023Updated 3 years ago
kba / hocr-spec
View on GitHub
The hOCR Embedded OCR Workflow and Output Format
☆74Aug 12, 2024Updated last year
kba / hocrjs
View on GitHub
Working with hOCR in Javascript
☆134Mar 4, 2023Updated 3 years ago
PRImA-Research-Lab / prima-page-converter
View on GitHub
Command line tool to convert page layout files to the latest PAGE XML format. It supports all previous versions of the PAGE format as wel…
☆25Jan 30, 2021Updated 5 years ago
UB-Mannheim / crass
View on GitHub
Crop And Splice Segments (of scanned pages)
☆14Mar 11, 2019Updated 7 years ago
OCR-D / ocrd_anybaseocr
View on GitHub
DFKI Layout Detection for OCR-D
☆47May 1, 2025Updated last year
altoxml / schema
View on GitHub
ALTO XML schema - latest and all former versions
☆55Jul 8, 2026Updated last week
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
TurkuNLP / ocr-correction
View on GitHub
Post-processing OCR errors with seq2seq models
☆28Jul 30, 2020Updated 5 years ago
jze / ocropus-model_fraktur
View on GitHub
OCRopus model for Gothic print (Fraktur)
☆19Feb 16, 2020Updated 6 years ago
rsling / texrex
View on GitHub
texrex web page cleaning & ClaraX random walk crawler
☆11Dec 13, 2021Updated 4 years ago
jbrinley / HocrConverter
View on GitHub
Create PDFs and plain text from hOCR documents
☆36Jun 11, 2010Updated 16 years ago
OCR-D / format-converters
View on GitHub
Converters for various file formats used for representing OCR
☆12Apr 30, 2025Updated last year
PRImA-Research-Lab / prima-aletheia-web-emop
View on GitHub
Web-based page layout editor created for EMOP (Early Modern OCR Project).
☆11May 21, 2021Updated 5 years ago
PRImA-Research-Lab / prima-core-libs
View on GitHub
Core libraries by the PRImA Research Lab
☆16Jul 30, 2024Updated last year
OCR-D / ocrd_all
View on GitHub
Master repository which includes most other OCR-D repositories as submodules
☆73Jul 4, 2025Updated last year
OCR-D / ocrd_segment
View on GitHub
OCR-D-compliant page segmentation
☆67May 6, 2026Updated 2 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ASVLeipzig / cor-asv-fst
View on GitHub
OCR-D post-correction module based on weighted finite-state transducers
☆11Jan 13, 2024Updated 2 years ago
isaomatsunami / clstm-Japanese
View on GitHub
Japanese trained data of clstm
☆15Jun 6, 2016Updated 10 years ago
UB-Mannheim / ocr-gt-tools
View on GitHub
Ergonomic line-by-line transcription of scanned text.
☆53Feb 2, 2026Updated 5 months ago
stefan-it / gc4lm
View on GitHub
GC4LM: A Colossal (Biased) language model for German
☆13May 2, 2021Updated 5 years ago
altoxml / documentation
View on GitHub
Documentation and use cases for ALTO XML
☆42Sep 10, 2018Updated 7 years ago
CITlabRostock / citlab-article-separation-new
View on GitHub
Modules used for separating articles in (historical) newspapers and similar documents. This repository is part of the European Union's Ho…
☆22Sep 2, 2022Updated 3 years ago
cisocrgroup / ocrd_cis
View on GitHub
OCR-D python tools
☆33Aug 16, 2024Updated last year
UB-Mannheim / ocromore
View on GitHub
Process, enhance and evaluate multiple OCR output.
☆24Dec 2, 2025Updated 7 months ago
mauvilsa / nw-page-editor
View on GitHub
Simple app for visual editing of Page XML files
☆31Sep 25, 2025Updated 9 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
mauvilsa / tesseract-recognize
View on GitHub
Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format
☆47Mar 31, 2025Updated last year
wolfgangmm / tei-simple-pm
View on GitHub
An implementation of the TEI Simple ODD extensions for processing models in XQuery.
☆22Jul 24, 2019Updated 6 years ago
eeditiones / tei-publisher-static
View on GitHub
A static site generator for TEI Publisher
☆13Mar 8, 2022Updated 4 years ago
not-implemented / hocr-proofreader
View on GitHub
Web based JavaScript GUI library for proofreading/editing hOCR
☆102Sep 17, 2018Updated 7 years ago
Early-Modern-OCR / hOCR-De-Noising
View on GitHub
code to remove "noise" from hOCR output of Tesseract OCR.
☆14Oct 24, 2016Updated 9 years ago
NorskRegnesentral / NeuralTextSanitizer
View on GitHub
Neural models for detecting and masking personal information from texts
☆16Nov 25, 2022Updated 3 years ago
concordusapps / python-hocr
View on GitHub
HOCR manipulation and utility library; provides hocr2pdf binary.
☆14Mar 5, 2018Updated 8 years ago