TurkuNLP/ocr-correction

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/TurkuNLP/ocr-correction)

TurkuNLP / ocr-correction

Post-processing OCR errors with seq2seq models

☆28

Alternatives and similar repositories for ocr-correction

Users that are interested in ocr-correction are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

KBNLresearch / ochre
View on GitHub
Toolbox for OCR post-correction
☆120Sep 19, 2019Updated 6 years ago
mpsilfve / ocrpp
View on GitHub
OCR post processing and spelling correction.
☆11Nov 12, 2018Updated 7 years ago
mikahama / natas
View on GitHub
Python 3 library for processing historical English
☆68Aug 10, 2024Updated last year
Doreenruirui / ACL2018_Multi_Input_OCR
View on GitHub
☆13Jun 25, 2019Updated 7 years ago
Dedsec-Xu / DatasetImgLabel-ICDAR2015
View on GitHub
DatasetImgLabeler is a image annotation tool for researchers to prepare datasets in ICDAR2015 format
☆12Dec 7, 2019Updated 6 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
leftthomas / DANet
View on GitHub
A PyTorch implementation of DANet based on CVPR 2019 paper "Dual Attention Network for Scene Segmentation"
☆11Oct 30, 2019Updated 6 years ago
Transkribus / TranskribusDU
View on GitHub
Document Understanding tools
☆21Dec 22, 2021Updated 4 years ago
czcorpus / InterText_editor
View on GitHub
Editor for aligned parallel texts (personal desktop application).
☆20Jan 15, 2026Updated 6 months ago
cisocrgroup / OCR-Workshop
View on GitHub
Presentations, tutorials and data for the OCR workshop at LMU
☆16Jun 2, 2017Updated 9 years ago
Joon-Park92 / Zero-Shot-Translation-Transformer
View on GitHub
Zero-Shot Translation implemented by Transformer
☆14Mar 24, 2023Updated 3 years ago
alexyorke / butter-fingers
View on GitHub
A python library to generate highly realistic typos (fuzz-testing)
☆13Mar 16, 2025Updated last year
tkianai / ICDAR2019-tools
View on GitHub
Tools for ICDAR2019 competitions(fifth place)
☆11May 6, 2019Updated 7 years ago
brooklyn1900 / SPCNet
View on GitHub
fixed some errors from AirBernard/Scene-Text-Detection-with-SPCNET
☆14Jul 29, 2019Updated 6 years ago
dinosauria123 / gcv2hocr
View on GitHub
gcv2hocr converts from Google Cloud Vision OCR output to hocr to make a searchable pdf.
☆108Oct 22, 2020Updated 5 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
jze / ocropus-model_fraktur
View on GitHub
OCRopus model for Gothic print (Fraktur)
☆19Feb 16, 2020Updated 6 years ago
apoorva-sharma / deep-frame-interpolation
View on GitHub
Deep Learning Solution for Video Frame Interpolation
☆12Feb 22, 2017Updated 9 years ago
whq-hqw / sroie2019
View on GitHub
This is an OCR solution for receipts, invoices, etc.
☆20May 24, 2020Updated 6 years ago
Early-Modern-OCR / FrankenPlus
View on GitHub
Part of eMOP: Franken+ tool for creating font training for Tesseract OCR engine from page images.
☆24Sep 24, 2015Updated 10 years ago
ASVLeipzig / cor-asv-fst
View on GitHub
OCR-D post-correction module based on weighted finite-state transducers
☆11Jan 13, 2024Updated 2 years ago
qurator-spk / sbb_textline_detection
View on GitHub
Detect textlines in document images
☆90May 27, 2024Updated 2 years ago
petterhh / ndt-tools
View on GitHub
Tools for Norwegian NLP based on the Norwegian Dependency Treebank.
☆17Jun 8, 2017Updated 9 years ago
CatWang / Synthesize_text_generation_Python
View on GitHub
一个比较复杂的生成真实场景文字的Python项目。原项目只能生成英文。经过修改之后能够生成中文。并且我也添加了图片中文字的切割和对应label的保存代码。
☆33May 4, 2017Updated 9 years ago
yxgong0 / DTR
View on GitHub
A network for irregular text recognition.
☆26Dec 11, 2020Updated 5 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
ramtinms / tokenquery
View on GitHub
TokenQuery (regular expressions over tokens)
☆28Mar 1, 2017Updated 9 years ago
PRImA-Research-Lab / prima-page-converter
View on GitHub
Command line tool to convert page layout files to the latest PAGE XML format. It supports all previous versions of the PAGE format as wel…
☆25Jan 30, 2021Updated 5 years ago
DataTurks / Entity-Recognition-In-Resumes-SpaCy
View on GitHub
Automatic Summarization of Resumes with NER -> Evaluate resumes at a glance through Named Entity Recognition
☆25Jul 16, 2019Updated 7 years ago
OCR-D / ocrd_segment
View on GitHub
OCR-D-compliant page segmentation
☆67May 6, 2026Updated 2 months ago
kmike / dialog2017
View on GitHub
☆10Jul 21, 2017Updated 8 years ago
leonlulu / DeepLayout
View on GitHub
Deep learning based page layout analysis
☆197Apr 24, 2019Updated 7 years ago
Chris10M / RFB-Text-Detection
View on GitHub
A Dense Text Detection model using Receptive Field Blocks
☆32Nov 21, 2022Updated 3 years ago
qurator-spk / sbb_images
View on GitHub
Image Annotation Tool and Image Search
☆17Apr 24, 2026Updated 2 months ago
ltgoslo / norne
View on GitHub
Norwegian Named Entities annotations on top of NDT (Norwegian Dependency Treebank)
☆71Sep 10, 2024Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
ufal / treex
View on GitHub
Treex NLP framework
☆32Jul 9, 2026Updated last week
kba / transkribus-to-prima
View on GitHub
Convert Transkribus PAGE-XML to standard PAGE-XML
☆12Dec 10, 2025Updated 7 months ago
ltgoslo / simple_elmo_training
View on GitHub
Minimal code to train ELMo models in recent versions of TensorFlow
☆14Jun 16, 2026Updated last month
cisnlp / parcoure
View on GitHub
ParCourE - Parallel Corpus Explorer
☆12Dec 27, 2021Updated 4 years ago
falcondai / trained-ABS-model
View on GitHub
a trained attention-based summarization model
☆10May 22, 2017Updated 9 years ago
SapienzaNLP / clubert
View on GitHub
Distribution of word meanings in Wikipedia for English, Italian, French, German and Spanish.
☆10Jan 4, 2021Updated 5 years ago
bikash / DocumentUnderstanding
View on GitHub
Research papers and code on information extraction from image/pdf
☆97Nov 25, 2022Updated 3 years ago