BobLd/simple-docstrum

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/BobLd/simple-docstrum)

BobLd / simple-docstrum

A step-by-step C# implementation of the Docstrum algorithm

☆24

Alternatives and similar repositories for simple-docstrum

Users that are interested in simple-docstrum are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Wild-Rift / Document-Layout-Analysis
View on GitHub
Tools for extract figure, table, text, .. from a pdf document.
☆35Nov 25, 2020Updated 5 years ago
chulwoopack / docstrum
View on GitHub
☆72Apr 3, 2018Updated 8 years ago
BobLd / DocumentLayoutAnalysis
View on GitHub
Document Layout Analysis resources repos for development with PdfPig.
☆637Oct 1, 2023Updated 2 years ago
hpanwar08 / document-layout-analysis-app
View on GitHub
Simple docker deployment of document layout analysis using detectron2
☆19Nov 7, 2021Updated 4 years ago
ihdia / BoundaryNet
View on GitHub
BoundaryNet - A Semi-Automatic Layout Annotation Tool
☆24Dec 11, 2021Updated 4 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
UB-Mannheim / Fibeln
View on GitHub
Transkriptionen von Fibeln (19. Jahrhundert)
☆11Oct 31, 2025Updated 8 months ago
sam-ai / BertGrid
View on GitHub
Implementation of BertGrid : https://arxiv.org/abs/1909.04948
☆30Apr 10, 2024Updated 2 years ago
bertsky / ocrd_publaynet
View on GitHub
convert PubLayNet data into METS/PAGE-XML
☆10Mar 17, 2020Updated 6 years ago
BobLd / camelot-sharp
View on GitHub
A C# library to extract tabular data from PDFs (port of camelot Python version using PdfPig).
☆35Feb 4, 2022Updated 4 years ago
MBAigner / PDFSegmenter
View on GitHub
This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…
☆23Sep 11, 2020Updated 5 years ago
stuartemiddleton / glosat_table_dataset
View on GitHub
GloSAT Historical Measurement Table Dataset
☆11Dec 3, 2025Updated 7 months ago
OCR-D / ocrd_pagetopdf
View on GitHub
OCR-D wrapper for prima-pagetopdf
☆10Oct 30, 2025Updated 8 months ago
abchapman93 / ReHouSED
View on GitHub
NLP system for identifying patient housing status in Veteran Affairs
☆11Feb 18, 2024Updated 2 years ago
IBM / ICDAR2021-SLP
View on GitHub
ICDAR 2021 Competition on Scientific Literature Parsing
☆35Aug 20, 2020Updated 5 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
OCR-D / spec
View on GitHub
Specification of the @OCR-D technical architecture, interface definitions and data exchange format(s)
☆17Sep 18, 2025Updated 10 months ago
Vasistareddy / python-rlsa
View on GitHub
RUN LENGTH SMOOTHING ALGORITHM(RLSA) is a method mainly used for block segmentation and text discrimination. It helps to extract the nece…
☆24Jun 21, 2022Updated 4 years ago
OCR-D / format-converters
View on GitHub
Converters for various file formats used for representing OCR
☆12Apr 30, 2025Updated last year
Eonm / zotero
View on GitHub
A rust client for Zotero's API
☆14Nov 6, 2023Updated 2 years ago
VRI-UFPR / ocrd-gbn
View on GitHub
OCR-D compliant toolset for optical layout recognition on historical german-language documents published in Brazil
☆11Sep 24, 2021Updated 4 years ago
ulb-sachsen-anhalt / ulb-zeitungsprojekt-hp1
View on GitHub
Training data from "Hauptphase I" of project "Digitalisierung historischer deutscher Zeitungen"
☆12Dec 17, 2021Updated 4 years ago
BobLd / PdfPigMLNetBlockClassifier
View on GitHub
Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The objective is to classify each text block…
☆29Mar 16, 2020Updated 6 years ago
etalab-ia / pseudo_app
View on GitHub
Etalab's Lab IA Pseudonymization Demo source code
☆11Aug 3, 2023Updated 2 years ago
qurator-spk / sbb_textline_detection
View on GitHub
Detect textlines in document images
☆90May 27, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
philipallfrey / teihub
View on GitHub
Automated listing of repos in GitHub with XML files containing teiHeader. Find a project using TEI today!
☆17Updated this week
rbaguila / document-layout-analysis
View on GitHub
A simple document layout analysis using Python-OpenCV
☆125Aug 11, 2020Updated 5 years ago
bertsky / ocrd_detectron2
View on GitHub
OCR-D wrapper for detectron2 based segmentation models
☆16May 1, 2025Updated last year
klippa-app / pdfium-cli
View on GitHub
Easy to use PDF CLI tool powered by PDFium and go-pdfium
☆35Jun 11, 2026Updated last month
LivingSkyTechnologies / Document_Layout_Segmentation
View on GitHub
Repository to use/train segmentation models for document layout analysis
☆19Jan 13, 2022Updated 4 years ago
Inist-CNRS / ghostscript-js
View on GitHub
Just a nodeJS wrapper for ghostscript
☆12Jul 12, 2023Updated 3 years ago
18F / sheet-to-csv
View on GitHub
DEPRECATED: Use https://github.com/18F/gapps-download instead
☆10Oct 27, 2015Updated 10 years ago
swapnil-ahlawat / Document_Layout_Analysis-MonkAI
View on GitHub
DL models that take a document image file as input, locate the position of paragraphs, lines, images, etc. with their labels and confiden…
☆26Dec 31, 2020Updated 5 years ago
18F / api-program
View on GitHub
A complete agency API program.
☆12Apr 27, 2017Updated 9 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
OCR-D / ocrd_calamari
View on GitHub
Recognize text using Calamari OCR and the OCR-D framework
☆16May 13, 2025Updated last year
jodaiber / semantic_compound_splitting
View on GitHub
A compound splitter based on the semantic regularities in the vector space of word embeddings.
☆16Mar 15, 2017Updated 9 years ago
PonteIneptique / YALTAi
View on GitHub
You Actually Look Twice At it
☆42Apr 15, 2026Updated 3 months ago
integeruser / bowkin
View on GitHub
A tool for patching binaries to use specific versions of glibc
☆22Jun 16, 2019Updated 7 years ago
y2labs-0sh / dada-api
View on GitHub
☆11Nov 13, 2020Updated 5 years ago
microsoft / federalist
View on GitHub
Federalist is a unified interface for publishing static government websites.
☆16Sep 6, 2023Updated 2 years ago
stefanklut / laypa
View on GitHub
Layout analysis to find layout elements in documents (similar to P2PaLA)
☆22May 20, 2026Updated 2 months ago