A step-by-step C# implementation of the Docstrum algorithm
☆24Dec 13, 2020Updated 5 years ago
Alternatives and similar repositories for simple-docstrum
Users that are interested in simple-docstrum are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Tools for extract figure, table, text, .. from a pdf document.☆33Nov 25, 2020Updated 5 years ago
- Document Layout Analysis resources repos for development with PdfPig.☆634Oct 1, 2023Updated 2 years ago
- PAGE XML format collection for document image page content and more☆71Jan 16, 2026Updated 2 months ago
- Simple docker deployment of document layout analysis using detectron2☆19Nov 7, 2021Updated 4 years ago
- BoundaryNet - A Semi-Automatic Layout Annotation Tool☆24Dec 11, 2021Updated 4 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Transkriptionen von Fibeln (19. Jahrhundert)☆11Oct 31, 2025Updated 5 months ago
- Implementation of BertGrid : https://arxiv.org/abs/1909.04948☆30Apr 10, 2024Updated 2 years ago
- This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…☆23Sep 11, 2020Updated 5 years ago
- ngram graphs library☆12Dec 2, 2021Updated 4 years ago
- A C# library to extract tabular data from PDFs (port of camelot Python version using PdfPig).☆35Feb 4, 2022Updated 4 years ago
- the MSER for text detection☆11Jun 20, 2017Updated 8 years ago
- NLP system for identifying patient housing status in Veteran Affairs☆12Feb 18, 2024Updated 2 years ago
- RUN LENGTH SMOOTHING ALGORITHM(RLSA) is a method mainly used for block segmentation and text discrimination. It helps to extract the nece…☆24Jun 21, 2022Updated 3 years ago
- Smartcrop, a multi-pass context-aware cropping tool☆11Jan 31, 2018Updated 8 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- OCR-D post-correction with encoder-attention-decoder LSTMs☆13May 1, 2025Updated 11 months ago
- SLUB Document Classification and Similarity Analysis☆10Aug 31, 2023Updated 2 years ago
- GloSAT Historical Measurement Table Dataset☆11Dec 3, 2025Updated 4 months ago
- METS 1.x and METS 2 schemas☆26May 28, 2025Updated 10 months ago
- convert qqwweee/keras-yolo3 h5 file to tensorflow pb file☆11Jul 17, 2020Updated 5 years ago
- Dice.com's relevancy feedback solr plugin created by Simon Hughes (Dice). Contains request handlers for doing MLT style recommendations, …☆23May 12, 2021Updated 4 years ago
- Grobid module for superconductor material and properties extraction☆22May 17, 2025Updated 10 months ago
- OCR-D post-correction module based on weighted finite-state transducers☆11Jan 13, 2024Updated 2 years ago
- Deep learning based page layout analysis☆197Apr 24, 2019Updated 6 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- OCR-D compliant toolset for optical layout recognition on historical german-language documents published in Brazil☆11Sep 24, 2021Updated 4 years ago
- Training data from "Hauptphase I" of project "Digitalisierung historischer deutscher Zeitungen"☆12Dec 17, 2021Updated 4 years ago
- DEPRECATED: Use https://github.com/18F/gapps-download instead☆10Oct 27, 2015Updated 10 years ago
- Rust library for working with data from Wikidata.☆14Jul 10, 2025Updated 9 months ago
- A workflow system for Natural Language Processing.☆21Oct 17, 2019Updated 6 years ago
- Digital Contracting Cookbook☆10Mar 9, 2016Updated 10 years ago
- A simple document layout analysis using Python-OpenCV☆127Aug 11, 2020Updated 5 years ago
- OCR-D wrapper for detectron2 based segmentation models☆17May 1, 2025Updated 11 months ago
- Just a nodeJS wrapper for ghostscript☆12Jul 12, 2023Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- An Editor for creating simple or complex OCR workflows☆17Jun 13, 2024Updated last year
- Convert PAGE (v. 2019) to ALTO (v. 2.0 - 4.2)☆15Jan 20, 2026Updated 2 months ago
- Repository to use/train segmentation models for document layout analysis☆19Jan 13, 2022Updated 4 years ago
- Progressively enhance your HTML with dynamic data☆13May 1, 2018Updated 7 years ago
- A playground for classifying products based on image and text features using deep learning.☆25Jul 7, 2019Updated 6 years ago
- RUN LENGTH SMOOTHING ALGORITHM(RLSA) is a method mainly used for block segmentation and text discrimination. It helps to extract the nece…☆29Nov 5, 2023Updated 2 years ago
- How About Machine Learning Enhancing Theses? - a pilot discovery project☆14May 23, 2023Updated 2 years ago