A step-by-step C# implementation of the Docstrum algorithm
☆24Dec 13, 2020Updated 5 years ago
Alternatives and similar repositories for simple-docstrum
Users that are interested in simple-docstrum are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Tools for extract figure, table, text, .. from a pdf document.☆34Nov 25, 2020Updated 5 years ago
- ☆71Apr 3, 2018Updated 8 years ago
- Document Layout Analysis resources repos for development with PdfPig.☆635Oct 1, 2023Updated 2 years ago
- Document Layout Analysis Projects☆23Sep 4, 2019Updated 6 years ago
- PAGE XML format collection for document image page content and more☆72Jan 16, 2026Updated 4 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Implementation of BertGrid : https://arxiv.org/abs/1909.04948☆30Apr 10, 2024Updated 2 years ago
- This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…☆23Sep 11, 2020Updated 5 years ago
- convert PubLayNet data into METS/PAGE-XML☆10Mar 17, 2020Updated 6 years ago
- ngram graphs library☆12Dec 2, 2021Updated 4 years ago
- Extract tables from PDF files (port of tabula-java)☆210May 4, 2026Updated 2 weeks ago
- NLP system for identifying patient housing status in Veteran Affairs☆11Feb 18, 2024Updated 2 years ago
- OCR-D post-correction with encoder-attention-decoder LSTMs☆13May 1, 2025Updated last year
- ICDAR 2021 Competition on Scientific Literature Parsing☆35Aug 20, 2020Updated 5 years ago
- SLUB Document Classification and Similarity Analysis☆10Aug 31, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- GloSAT Historical Measurement Table Dataset☆11Dec 3, 2025Updated 5 months ago
- Specification of the @OCR-D technical architecture, interface definitions and data exchange format(s)☆17Sep 18, 2025Updated 8 months ago
- METS 1.x and METS 2 schemas☆26May 28, 2025Updated 11 months ago
- Converters for various file formats used for representing OCR☆12Apr 30, 2025Updated last year
- Grobid module for superconductor material and properties extraction☆22May 17, 2025Updated last year
- Reichsanzeiger-NLP: NER/NEL corpus for the German historical newspaper "Deutscher Reichsanzeiger und Preußischer Staatsanzeiger" (1819–19…☆16Oct 18, 2024Updated last year
- Deep learning based page layout analysis☆197Apr 24, 2019Updated 7 years ago
- OCR-D compliant toolset for optical layout recognition on historical german-language documents published in Brazil☆11Sep 24, 2021Updated 4 years ago
- Training data from "Hauptphase I" of project "Digitalisierung historischer deutscher Zeitungen"☆12Dec 17, 2021Updated 4 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Rust library for working with data from Wikidata.☆14Jul 10, 2025Updated 10 months ago
- Wrapper for the kraken OCR engine☆12Jul 12, 2025Updated 10 months ago
- Etalab's Lab IA Pseudonymization Demo source code☆11Aug 3, 2023Updated 2 years ago
- A workflow system for Natural Language Processing.☆21Oct 17, 2019Updated 6 years ago
- A simple document layout analysis using Python-OpenCV☆127Aug 11, 2020Updated 5 years ago
- Detect textlines in document images☆90May 27, 2024Updated last year
- Just a nodeJS wrapper for ghostscript☆12Jul 12, 2023Updated 2 years ago
- Rust library for extracting data from HTML tables.☆13Mar 4, 2024Updated 2 years ago
- Repository to use/train segmentation models for document layout analysis☆19Jan 13, 2022Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A Trello webhook server☆10May 18, 2016Updated 10 years ago
- An in-memory SQL database in Rust.☆15Aug 15, 2021Updated 4 years ago
- ☆13Jun 25, 2019Updated 6 years ago
- Training files for Greek cursive script (in early print)☆15May 26, 2021Updated 4 years ago
- Recognize text using Calamari OCR and the OCR-D framework☆16May 13, 2025Updated last year
- A compound splitter based on the semantic regularities in the vector space of word embeddings.☆16Mar 15, 2017Updated 9 years ago
- An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks☆11Mar 15, 2022Updated 4 years ago