A tutorial on the PyTorch-based ocropus components.
☆73Apr 18, 2020Updated 5 years ago
Alternatives and similar repositories for das2018-tutorial
Users that are interested in das2018-tutorial are comparing it to the libraries listed below
Sorting:
- Docker container for ocropus3 OCR system☆12Aug 19, 2018Updated 7 years ago
- Next generation OCR engine based on LSTMs.☆51Apr 8, 2018Updated 7 years ago
- Repository collecting all the submodules for the new PyTorch-based OCR System.☆142Feb 22, 2021Updated 5 years ago
- ☆10Mar 16, 2023Updated 2 years ago
- ☆72Jun 13, 2018Updated 7 years ago
- ☆126Apr 18, 2020Updated 5 years ago
- ☆20Aug 18, 2019Updated 6 years ago
- Augment line images for improving OCR datasets☆10Oct 4, 2023Updated 2 years ago
- Process, enhance and evaluate multiple OCR output.☆24Dec 2, 2025Updated 2 months ago
- Segmenting a given document using recursive xy-cut algorithm.☆12Oct 9, 2018Updated 7 years ago
- 'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy.☆22Feb 21, 2018Updated 8 years ago
- Glyph Miner, a system for extracting glyphs from early typeset prints☆34Sep 29, 2016Updated 9 years ago
- OCR-D post-correction with encoder-attention-decoder LSTMs☆13May 1, 2025Updated 9 months ago
- Tutorial on NE processing for Digital Humanities - DH Utrech 2019☆25Jul 18, 2019Updated 6 years ago
- ☆26Apr 18, 2020Updated 5 years ago
- ☆14Apr 18, 2020Updated 5 years ago
- OCR-D post-correction module based on weighted finite-state transducers☆11Jan 13, 2024Updated 2 years ago
- Rotation and skew detection using DL.☆60May 29, 2018Updated 7 years ago
- Convert between Tesseract hOCR and ALTO XML using XSL stylesheets☆59Sep 25, 2025Updated 5 months ago
- TensorFlow implementation of a segmentation system for document images.☆35Sep 9, 2018Updated 7 years ago
- DatasetImgLabeler is a image annotation tool for researchers to prepare datasets in ICDAR2015 format☆12Dec 7, 2019Updated 6 years ago
- This is an OCR solution for receipts, invoices, etc.☆20May 24, 2020Updated 5 years ago
- OCR-D-compliant page segmentation☆68Nov 19, 2025Updated 3 months ago
- Tools for TICCL☆14Dec 12, 2025Updated 2 months ago
- CNN Image Retrieval Model Weights Ported☆12Jun 2, 2018Updated 7 years ago
- convert PubLayNet data into METS/PAGE-XML☆10Mar 17, 2020Updated 5 years ago
- ☆138Apr 4, 2023Updated 2 years ago
- Web based JavaScript GUI library for proofreading/editing hOCR☆101Sep 17, 2018Updated 7 years ago
- DocBankLoader is a dataset loader for DocBank, and can convert DocBank to the Object Detection models' format.☆25Mar 17, 2021Updated 4 years ago
- ☆10May 24, 2019Updated 6 years ago
- Python SIP wrapper for libtesseract (Apache license)☆12Feb 20, 2017Updated 9 years ago
- ☆10Jan 24, 2020Updated 6 years ago
- Command line tool to convert page layout files to the latest PAGE XML format. It supports all previous versions of the PAGE format as wel…☆24Jan 30, 2021Updated 5 years ago
- Ergonomic line-by-line transcription of scanned text.☆54Feb 2, 2026Updated 3 weeks ago
- Crop And Splice Segments (of scanned pages)☆14Mar 11, 2019Updated 6 years ago
- Code for our paper accepted at EMNLP 2023 (Findings)☆14Jan 5, 2024Updated 2 years ago
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆408Aug 10, 2024Updated last year
- Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized☆788May 21, 2022Updated 3 years ago
- Manuals, lexica, OCR test data for PoCoTo and the profiler☆15Jul 2, 2021Updated 4 years ago