Early-Modern-OCR/hOCR-De-Noising

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Early-Modern-OCR/hOCR-De-Noising)

Early-Modern-OCR / hOCR-De-Noising

code to remove "noise" from hOCR output of Tesseract OCR.

☆14

Alternatives and similar repositories for hOCR-De-Noising

Users that are interested in hOCR-De-Noising are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Early-Modern-OCR / FrankenPlus
View on GitHub
Part of eMOP: Franken+ tool for creating font training for Tesseract OCR engine from page images.
☆24Sep 24, 2015Updated 10 years ago
brobertson / ciaconna
View on GitHub
Polytonic Greek OCR tool suite based on Ocropus 0.7
☆13Jul 5, 2023Updated 3 years ago
kba / transkribus-to-prima
View on GitHub
Convert Transkribus PAGE-XML to standard PAGE-XML
☆12Dec 10, 2025Updated 7 months ago
jze / ocropus-model_fraktur
View on GitHub
OCRopus model for Gothic print (Fraktur)
☆19Feb 16, 2020Updated 6 years ago
smurp / huviz
View on GitHub
interactive, customizable semantic web visualization
☆15Dec 27, 2025Updated 7 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
idhmc-tamu / eMOP
View on GitHub
files and code related to the Early Modern OCR Project (eMOP) at the IDHMC
☆16Oct 2, 2014Updated 11 years ago
TimCoogan / SmartWatcher
View on GitHub
simple windows service designed to watch a specific directories and taking specific actions to : Create - Change - Rename - Delete files …
☆12Mar 21, 2016Updated 10 years ago
filak / hOCR-to-ALTO
View on GitHub
Convert between Tesseract hOCR and ALTO XML using XSL stylesheets
☆60Mar 20, 2026Updated 4 months ago
zzolo / tables
View on GitHub
Tables is a simple command-line tool and powerful library for importing data like a CSV or JSON file into relational tables.
☆14Mar 23, 2026Updated 4 months ago
tokee / quack
View on GitHub
QA-tool for scans with corresponding ALTO-files
☆27Dec 2, 2022Updated 3 years ago
PublicI / fec-parse
View on GitHub
A Node module to parse raw FEC electronic filings, inspired by Fech.
☆19Apr 24, 2025Updated last year
LibraryOfCongress / viewshare
View on GitHub
A web application developed by Zepheira for the Library of Congress National Digital Information Infrastructure and Preservation Program …
☆45Apr 16, 2024Updated 2 years ago
Early-Modern-OCR / TesseractTraining
View on GitHub
Training files produced for and by the Tesseract OCR engine for work on the Early Modern OCR Project (eMOP)
☆37Sep 24, 2015Updated 10 years ago
brobertson / rigaudon
View on GitHub
Polytonic Greek OCR engine derived from Gamera and based on the work of Dalitz and Brandt
☆33Nov 25, 2014Updated 11 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
mittagessen / curt
View on GitHub
☆15Jul 11, 2022Updated 4 years ago
newsdev / nyt-pyfec
View on GitHub
A Python library for downloading, parsing and cleaning Federal Election Commission filings.
☆28Jan 30, 2024Updated 2 years ago
qurator-spk / neat
View on GitHub
Named entity annotation tool
☆28Jul 6, 2023Updated 3 years ago
jkunze / bagitspec
View on GitHub
☆34Nov 14, 2018Updated 7 years ago
dkpro / dkpro-c4corpus
View on GitHub
DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…
☆53Jun 12, 2020Updated 6 years ago
benedikt-budig / glyph-miner
View on GitHub
Glyph Miner, a system for extracting glyphs from early typeset prints
☆34Sep 29, 2016Updated 9 years ago
paalberti / tesseract-dan-fraktur
View on GitHub
Tesseract ocr training data for Danish written in fraktur script and a few other languages
☆17Aug 28, 2014Updated 11 years ago
asg017 / libfec
View on GitHub
CLI for parsing FEC files, for federal campaign finance pipelines
☆24Apr 15, 2026Updated 3 months ago
seuretm / ocrd_typegroups_classifier
View on GitHub
☆10Mar 16, 2023Updated 3 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
UB-Mannheim / GTCheck
View on GitHub
Check your modified Ground Truth files with visual support!
☆10Jan 31, 2024Updated 2 years ago
kzagoris / ImageEnhancementTool
View on GitHub
Implementation Code for Paper: K. Zagoris and I. Pratikakis, Bio-Inspired Modeling for the Enhancement of Historical Handwritten Document…
☆15Nov 24, 2017Updated 8 years ago
UB-Mannheim / crass
View on GitHub
Crop And Splice Segments (of scanned pages)
☆14Mar 11, 2019Updated 7 years ago
oldani / asgi-testClient
View on GitHub
Testing ASGI applications made easy!
☆11Aug 9, 2022Updated 3 years ago
CITlabRostock / citlab-article-separation-new
View on GitHub
Modules used for separating articles in (historical) newspapers and similar documents. This repository is part of the European Union's Ho…
☆22Sep 2, 2022Updated 3 years ago
pharos-alexandria / ocr-greek_cursive
View on GitHub
Training files for Greek cursive script (in early print)
☆15May 26, 2021Updated 5 years ago
maplight / CAPS
View on GitHub
CAL-ACCESS Campaign Power Search
☆12Nov 2, 2017Updated 8 years ago
eddieantonio / ocreval
View on GitHub
Update of the ISRI Analytic Tools for OCR Evaluation with UTF-8 support
☆60Apr 16, 2021Updated 5 years ago
openelections / fec_results
View on GitHub
Federal election results data from the Federal Election Commission
☆27Oct 3, 2016Updated 9 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
Early-Modern-OCR / RETAS
View on GitHub
Part of eMOP: the Recursive Text Alignment Tool compares OCR text results to groundtruth by character and computes a score.
☆24Sep 24, 2015Updated 10 years ago
danielecook / tut
View on GitHub
A collection of CSV/TSV Utilities
☆13Jun 2, 2020Updated 6 years ago
OCR-D / ocrd_anybaseocr
View on GitHub
DFKI Layout Detection for OCR-D
☆47May 1, 2025Updated last year
hbmartin / Directory-SwiftUI
View on GitHub
A directory demo app written with SwiftUI, Core Data, and Alamofire
☆16Jan 12, 2020Updated 6 years ago
dpla-attic / ingestion
View on GitHub
The DPLA ingestion system
☆23Oct 10, 2019Updated 6 years ago
cisocrgroup / Resources
View on GitHub
Manuals, lexica, OCR test data for PoCoTo and the profiler
☆15Jul 2, 2021Updated 5 years ago
noahbrenner / gsvg
View on GitHub
Reformat SVG files to reduce git diff noise
☆17Jul 28, 2020Updated 6 years ago