A step-by-step C# implementation of the Docstrum algorithm
☆24Dec 13, 2020Updated 5 years ago
Alternatives and similar repositories for simple-docstrum
Users that are interested in simple-docstrum are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Tools for extract figure, table, text, .. from a pdf document.☆33Nov 25, 2020Updated 5 years ago
- Document Layout Analysis resources repos for development with PdfPig.☆634Oct 1, 2023Updated 2 years ago
- PAGE XML format collection for document image page content and more☆71Jan 16, 2026Updated 3 months ago
- Simple docker deployment of document layout analysis using detectron2☆19Nov 7, 2021Updated 4 years ago
- BoundaryNet - A Semi-Automatic Layout Annotation Tool☆24Dec 11, 2021Updated 4 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Transkriptionen von Fibeln (19. Jahrhundert)☆11Oct 31, 2025Updated 6 months ago
- This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…☆23Sep 11, 2020Updated 5 years ago
- convert PubLayNet data into METS/PAGE-XML☆10Mar 17, 2020Updated 6 years ago
- Extract tables from PDF files (port of tabula-java)☆210Mar 17, 2025Updated last year
- RUN LENGTH SMOOTHING ALGORITHM(RLSA) is a method mainly used for block segmentation and text discrimination. It helps to extract the nece…☆24Jun 21, 2022Updated 3 years ago
- Smartcrop, a multi-pass context-aware cropping tool☆11Jan 31, 2018Updated 8 years ago
- OCR-D post-correction with encoder-attention-decoder LSTMs☆13May 1, 2025Updated last year
- SLUB Document Classification and Similarity Analysis☆10Aug 31, 2023Updated 2 years ago
- GloSAT Historical Measurement Table Dataset☆11Dec 3, 2025Updated 5 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Specification of the @OCR-D technical architecture, interface definitions and data exchange format(s)☆17Sep 18, 2025Updated 7 months ago
- METS 1.x and METS 2 schemas☆26May 28, 2025Updated 11 months ago
- Converters for various file formats used for representing OCR☆12Apr 30, 2025Updated last year
- Grobid module for superconductor material and properties extraction☆22May 17, 2025Updated 11 months ago
- OCR-D compliant toolset for optical layout recognition on historical german-language documents published in Brazil☆11Sep 24, 2021Updated 4 years ago
- Training data from "Hauptphase I" of project "Digitalisierung historischer deutscher Zeitungen"☆12Dec 17, 2021Updated 4 years ago
- Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The objective is to classify each text block…☆28Mar 16, 2020Updated 6 years ago
- DEPRECATED: Use https://github.com/18F/gapps-download instead☆10Oct 27, 2015Updated 10 years ago
- Wrapper for the kraken OCR engine☆12Jul 12, 2025Updated 9 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Etalab's Lab IA Pseudonymization Demo source code☆11Aug 3, 2023Updated 2 years ago
- Easy to use PDF CLI tool powered by PDFium and go-pdfium☆35Apr 15, 2026Updated 2 weeks ago
- A simple document layout analysis using Python-OpenCV☆127Aug 11, 2020Updated 5 years ago
- You Actually Look Twice At it☆41Apr 15, 2026Updated 2 weeks ago
- Detect textlines in document images☆91May 27, 2024Updated last year
- Automated listing of repos in GitHub with XML files containing teiHeader. Find a project using TEI today!☆17Updated this week
- Just a nodeJS wrapper for ghostscript☆12Jul 12, 2023Updated 2 years ago
- An Editor for creating simple or complex OCR workflows☆17Jun 13, 2024Updated last year
- Convert PAGE (v. 2019) to ALTO (v. 2.0 - 4.2)☆17Jan 20, 2026Updated 3 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Repository to use/train segmentation models for document layout analysis☆19Jan 13, 2022Updated 4 years ago
- Progressively enhance your HTML with dynamic data☆13May 1, 2018Updated 8 years ago
- Haversine distance between two points☆13Jun 20, 2023Updated 2 years ago
- Dokku buildpack for GitLab☆22Apr 5, 2015Updated 11 years ago
- RUN LENGTH SMOOTHING ALGORITHM(RLSA) is a method mainly used for block segmentation and text discrimination. It helps to extract the nece…☆29Nov 5, 2023Updated 2 years ago
- ☆13Jun 25, 2019Updated 6 years ago
- Just a test with gulp, lib-sass/boubon, react and shoe☆37Jan 28, 2014Updated 12 years ago