tesseract-ocr/langdata

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/tesseract-ocr/langdata)

tesseract-ocr / langdata

Source training data for Tesseract for lots of languages

☆870

Alternatives and similar repositories for langdata

Users that are interested in langdata are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tesseract-ocr / tessdata
View on GitHub
Trained models with fast variant of the "best" LSTM models + legacy models
☆7,607Mar 9, 2024Updated 2 years ago
tesseract-ocr / docs
View on GitHub
Various documents related to Tesseract OCR
☆269Sep 12, 2021Updated 4 years ago
tesseract-ocr / langdata_lstm
View on GitHub
Data used for LSTM model training
☆127Mar 9, 2024Updated 2 years ago
tesseract-ocr / tesseract
View on GitHub
Tesseract Open Source OCR Engine (main repository)
☆75,484Updated this week
tesseract-ocr / tessdata_best
View on GitHub
Best (most accurate) trained LSTM models.
☆1,568Mar 9, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
tesseract-ocr / tessdata_fast
View on GitHub
Fast integer versions of trained LSTM models
☆608Aug 1, 2024Updated last year
tesseract-ocr / tesstrain
View on GitHub
Train Tesseract LSTM with make
☆722Apr 18, 2025Updated last year
Shreeshrii / tess5train-fonts
View on GitHub
Files and Scripts to run Tesseract 5 LSTM Training using fonts
☆78Feb 6, 2022Updated 4 years ago
ocropus-archive / DUP-ocropy
View on GitHub
Python-based tools for document analysis and OCR
☆3,466May 22, 2021Updated 5 years ago
nguyenq / jTessBoxEditor
View on GitHub
Box editor and trainer for Tesseract OCR
☆247Jun 2, 2026Updated last month
gheyret / UyghurNgram
View on GitHub
Make N-Gram for Uyghur language
☆15Dec 24, 2020Updated 5 years ago
MicrocontrollersAndMore / OpenCV_KNN_Character_Recognition_Machine_Learning
View on GitHub
☆10Nov 24, 2015Updated 10 years ago
tesseract-ocr / tessdoc
View on GitHub
Tesseract documentation
☆2,403Jun 28, 2026Updated 3 weeks ago
UB-Mannheim / Fibeln
View on GitHub
Transkriptionen von Fibeln (19. Jahrhundert)
☆11Oct 31, 2025Updated 8 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
UB-Mannheim / eScriptorium_Dokumentation
View on GitHub
This repository provides German documentation relating to the text recognition and transcription platform eScriptorium. The documentation…
☆16Dec 6, 2025Updated 7 months ago
tmbdev / clstm
View on GitHub
A small C++ implementation of LSTM networks, focused on OCR.
☆831Oct 24, 2019Updated 6 years ago
pannous / tensorflow-ocr
View on GitHub
🖺 OCR using tensorflow with attention
☆644Sep 5, 2019Updated 6 years ago
UB-Mannheim / GTCheck
View on GitHub
Check your modified Ground Truth files with visual support!
☆10Jan 31, 2024Updated 2 years ago
tesseract-ocr / tessdata_contrib
View on GitHub
User contributed (non Google) OCR models for Tesseract
☆33Jun 12, 2026Updated last month
UB-Mannheim / blatt
View on GitHub
NLP-helper for OCR-ed pages in PAGE XML format
☆10Dec 6, 2024Updated last year
ocropus / hocr-tools
View on GitHub
Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.
☆416Aug 10, 2024Updated last year
DanBloomberg / leptonica
View on GitHub
Leptonica is an open source library containing software that is broadly useful for image processing and image analysis applications. The …
☆2,061Jul 12, 2026Updated last week
Calamari-OCR / calamari_models
View on GitHub
Pretrained mixed models to be used with Calamari.
☆74Oct 1, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
GNOME / ocrfeeder
View on GitHub
Read-only mirror of https://gitlab.gnome.org/GNOME/ocrfeeder
☆95Apr 14, 2026Updated 3 months ago
guzhenping / the-Papers-and-Data-of-Tesseract-OCR-
View on GitHub
l read the classic papers writted by Ray Smith.During reading , l made some notes in Chinese .From now , l have known lots of information…
☆31Jan 19, 2018Updated 8 years ago
wikimedia / wikimedia-ocr
View on GitHub
This repository is now at https://gitlab.wikimedia.org/toolforge-repos/ocr
☆17May 19, 2026Updated 2 months ago
openpaperwork / pyocr
View on GitHub
A Python wrapper for Tesseract and Cuneiform -- Moved to Gnome's Gitlab
☆929Jun 13, 2018Updated 8 years ago
pannous / caffe-ocr
View on GitHub
OCR with caffe deep learning framework -> Migrated to tensorflow
☆215Dec 22, 2016Updated 9 years ago
UB-Mannheim / reichsanzeiger-nlp
View on GitHub
Reichsanzeiger-NLP: NER/NEL corpus for the German historical newspaper "Deutscher Reichsanzeiger und Preußischer Staatsanzeiger" (1819–19…
☆16Oct 18, 2024Updated last year
rmtheis / android-ocr
View on GitHub
Experimental optical character recognition app
☆2,226May 5, 2018Updated 8 years ago
ulb-sachsen-anhalt / ulb-zeitungsprojekt-hp1
View on GitHub
Training data from "Hauptphase I" of project "Digitalisierung historischer deutscher Zeitungen"
☆12Dec 17, 2021Updated 4 years ago
tongpi / synthtext100kCH
View on GitHub
佟派中文合成文本数据集是一个用来训练自然场景文本识别模型的数据集。
☆45May 24, 2017Updated 9 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
w3c / type-samples
View on GitHub
A place to find and contribute examples of typographic features in text, especially from non-Latin scripts. Please read the instructions…
☆21Sep 25, 2025Updated 9 months ago
UB-Mannheim / tesseract
View on GitHub
Tesseract Open Source OCR Engine (main repository)
☆4,517Jun 20, 2026Updated last month
zdenop / qt-box-editor
View on GitHub
QT Box Editor of tesseract-ocr box files
☆176Oct 14, 2024Updated last year
menzenski / Uyghur-resources
View on GitHub
Collection of resources for Uyghur linguistics.
☆15Nov 29, 2015Updated 10 years ago
tianzhi0549 / CTPN
View on GitHub
Detecting Text in Natural Image with Connectionist Text Proposal Network (ECCV'16)
☆1,287Oct 15, 2021Updated 4 years ago
OCR-D / ocrd_all
View on GitHub
Master repository which includes most other OCR-D repositories as submodules
☆73Jul 4, 2025Updated last year
google / language-resources
View on GitHub
Datasets and tools for basic natural language processing.
☆389Sep 10, 2021Updated 4 years ago