A step-by-step C# implementation of the Docstrum algorithm
☆24Dec 13, 2020Updated 5 years ago
Alternatives and similar repositories for simple-docstrum
Users that are interested in simple-docstrum are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Tools for extract figure, table, text, .. from a pdf document.☆33Nov 25, 2020Updated 5 years ago
- ☆71Apr 3, 2018Updated 7 years ago
- Document Layout Analysis resources repos for development with PdfPig.☆633Oct 1, 2023Updated 2 years ago
- PAGE XML format collection for document image page content and more☆71Jan 16, 2026Updated 2 months ago
- BoundaryNet - A Semi-Automatic Layout Annotation Tool☆24Dec 11, 2021Updated 4 years ago
- Transkriptionen von Fibeln (19. Jahrhundert)☆11Oct 31, 2025Updated 4 months ago
- This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…☆23Sep 11, 2020Updated 5 years ago
- convert PubLayNet data into METS/PAGE-XML☆10Mar 17, 2020Updated 6 years ago
- Extract tables from PDF files (port of tabula-java)☆205Mar 17, 2025Updated last year
- NLP system for identifying patient housing status in Veteran Affairs☆12Feb 18, 2024Updated 2 years ago
- OCR-D post-correction with encoder-attention-decoder LSTMs☆13May 1, 2025Updated 10 months ago
- GloSAT Historical Measurement Table Dataset☆11Dec 3, 2025Updated 3 months ago
- Specification of the @OCR-D technical architecture, interface definitions and data exchange format(s)☆17Sep 18, 2025Updated 6 months ago
- METS 1.x and METS 2 schemas☆25May 28, 2025Updated 9 months ago
- Converters for various file formats used for representing OCR☆12Apr 30, 2025Updated 10 months ago
- OCR-D post-correction module based on weighted finite-state transducers☆11Jan 13, 2024Updated 2 years ago
- Deep learning based page layout analysis☆195Apr 24, 2019Updated 6 years ago
- OCR-D compliant toolset for optical layout recognition on historical german-language documents published in Brazil☆11Sep 24, 2021Updated 4 years ago
- A rust client for Zotero's API☆14Nov 6, 2023Updated 2 years ago
- Rust library for working with data from Wikidata.☆14Jul 10, 2025Updated 8 months ago
- Wrapper for the kraken OCR engine☆12Jul 12, 2025Updated 8 months ago
- Easy to use PDF CLI tool powered by PDFium and go-pdfium☆34Mar 2, 2026Updated 3 weeks ago
- Digital Contracting Cookbook☆10Mar 9, 2016Updated 10 years ago
- A simple document layout analysis using Python-OpenCV☆127Aug 11, 2020Updated 5 years ago
- You Actually Look Twice At it☆39Jan 21, 2025Updated last year
- OCR-D wrapper for detectron2 based segmentation models☆17May 1, 2025Updated 10 months ago
- An Editor for creating simple or complex OCR workflows☆17Jun 13, 2024Updated last year
- Convert PAGE (v. 2019) to ALTO (v. 2.0 - 4.2)☆15Jan 20, 2026Updated 2 months ago
- Progressively enhance your HTML with dynamic data☆13May 1, 2018Updated 7 years ago
- Repository to use/train segmentation models for document layout analysis☆19Jan 13, 2022Updated 4 years ago
- Haversine distance between two points☆13Jun 20, 2023Updated 2 years ago
- Dokku buildpack for GitLab☆22Apr 5, 2015Updated 10 years ago
- How About Machine Learning Enhancing Theses? - a pilot discovery project☆14May 23, 2023Updated 2 years ago
- Just a test with gulp, lib-sass/boubon, react and shoe☆37Jan 28, 2014Updated 12 years ago
- Training files for Greek cursive script (in early print)☆15May 26, 2021Updated 4 years ago
- A compound splitter based on the semantic regularities in the vector space of word embeddings.☆16Mar 15, 2017Updated 9 years ago
- ☆11Mar 16, 2026Updated last week
- ☆11Nov 13, 2020Updated 5 years ago
- This repository shows how to efficiently process variable-length sequences in TensorFlow.☆14Apr 26, 2022Updated 3 years ago