linuxlizard/page_segmentation

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/linuxlizard/page_segmentation)

linuxlizard / page_segmentation

Page Segmentation Code. I'm working with OCRopus and the UW-III data set to test how the page segmentation algorithms work with smaller strips of an image rather than the entire image.

☆20

Alternatives and similar repositories for page_segmentation

Users that are interested in page_segmentation are comparing it to the libraries listed below

Sorting:

bhavishya235 / Web-Classification
View on GitHub
This project deals with hierarchical classification of web pages based on dmoz dataset.
☆14Apr 10, 2014Updated 11 years ago
ldodds / slug
View on GitHub
A semantic web crawler
☆20Sep 20, 2010Updated 15 years ago
nik0spapp / sdalg
View on GitHub
Web page segmentation and noise removal
☆55Feb 4, 2024Updated 2 years ago
zygmuntz / stardose
View on GitHub
A recommender system for GitHub repositories
☆14Jun 21, 2014Updated 11 years ago
socialsensor / storm-focused-crawler
View on GitHub
Collects multimedia content shared through social networks.
☆19Feb 18, 2015Updated 11 years ago
nik0spapp / wmil
View on GitHub
Weighted multiple-instance learning algorithm
☆18Oct 9, 2018Updated 7 years ago
trec-kba / streamcorpus
View on GitHub
common data interchange format for document processing pipelines that apply natural language processing tools to large streams of text
☆35Sep 30, 2016Updated 9 years ago
DIVA-DIA / DIVA_Layout_Analysis_Evaluator
View on GitHub
Layout Analysis Evaluator for the ICDAR 2017 competition on Layout Analysis for Challenging Medieval Manuscripts
☆22May 17, 2019Updated 6 years ago
HannaRiver / Pixel-Anchor
View on GitHub
复现论文《Pixel-Anchor: A Fast Oriented Scene Text Detector with Combined Networks》
☆26Nov 26, 2018Updated 7 years ago
chulwoopack / docstrum
View on GitHub
☆70Apr 3, 2018Updated 7 years ago
phybrain / efficientdensenet_crnn
View on GitHub
memory efficient densenet+lstm+ctc实现中文识别
☆31Jun 21, 2022Updated 3 years ago
ikivanc / Document-Classification-and-Post-OCR-Key-Value-Extraction
View on GitHub
Document Classification and Post-OCR Key Value Extraction
☆62Nov 6, 2019Updated 6 years ago
gt-big-data / retina-clusterer
View on GitHub
The goal of this experiment is to take articles and certain metadata and group them by topic.
☆11Apr 14, 2016Updated 9 years ago
internetarchive / surt
View on GitHub
Sort-friendly URI Reordering Transform (SURT) python module
☆44Sep 11, 2025Updated 5 months ago
opconty / keras_std_plus_plus
View on GitHub
This repository is the official implementation of `A Semantic-based Arbitrarily-Oriented Scene Text Detector`(named STD++ as it is the im…
☆29Aug 14, 2019Updated 6 years ago
BlockCatIO / token-sale
View on GitHub
BlockCAT token sale smart contracts.
☆11Oct 19, 2017Updated 8 years ago
NaharD / tts
View on GitHub
Озвучування тексту українською (ukrainian tts)
☆10Oct 25, 2019Updated 6 years ago
wdickers / Focused_Crawler
View on GitHub
Focused Crawler for VT's CTRNet
☆10May 13, 2013Updated 12 years ago
rodricios / crawl-to-the-future
View on GitHub
An attempt at creating a gold standard dataset for backtesting yesterday & today's content-extractors
☆35Mar 19, 2015Updated 10 years ago
miha-stopar / extract-repetitions
View on GitHub
Extract (DOM tree) repetitions from a webpage
☆12Jan 13, 2014Updated 12 years ago
quantmind / pulsar-odm
View on GitHub
Green SqlAlchemy extensions for pulsar
☆11Nov 24, 2017Updated 8 years ago
cobrce / ShiftPWM
View on GitHub
Arduino library to generate a PWM signal over a shift register (74HC595)
☆12Sep 25, 2020Updated 5 years ago
geoparser / geolocator-3.0
View on GitHub
☆12Oct 25, 2015Updated 10 years ago
aws-solutions / amazon-marketing-cloud-insights-on-aws
View on GitHub
Amazon Marketing Cloud Insights on AWS helps advertisers and agencies running campaigns on Amazon Ads to easily deploy AWS services to st…
☆16Nov 3, 2025Updated 3 months ago
bikeindex / bikewise
View on GitHub
Bicycle Incident reporting
☆13Jul 22, 2022Updated 3 years ago
moravianlibrary / MEditor
View on GitHub
Digitization information system build on top of Fedora repository
☆16Jan 15, 2019Updated 7 years ago
dhlab-epfl / dhSegment
View on GitHub
Generic framework for historical document processing
☆382Jul 9, 2021Updated 4 years ago
johnhany / textRotCorrect
View on GitHub
DFT-based text image rotation correction using OpenCV
☆39Nov 25, 2013Updated 12 years ago
gwu-libraries / social-feed-manager
View on GitHub
"Old SFM" -- manage rules and streams from social data sources, starting with twitter.
☆86Aug 10, 2023Updated 2 years ago
oleiade / Elevator
View on GitHub
Elevator is an open source, on-disk key-value store. Provides high-performance bulk read-write operations over very large datasets while …
☆70May 14, 2014Updated 11 years ago
discourse-lab / DiscourseSegmenter
View on GitHub
A collection of various discourse segmenters
☆10Jun 30, 2017Updated 8 years ago
hsnr-gamera / gamera-4
View on GitHub
Gamera 4 for Python 3
☆14May 16, 2025Updated 9 months ago
Gazler / rapidash
View on GitHub
Rapidly develop your API client
☆144Nov 10, 2015Updated 10 years ago
AbeHandler / contracts_nlp
View on GitHub
Uses NLP methods to parse and classify contracts from The City of New Orleans
☆10Mar 23, 2015Updated 10 years ago
viirya / flickr_fetcher
View on GitHub
Research codes for image interestingness
☆17Dec 6, 2017Updated 8 years ago
dcolish / Cockerel
View on GitHub
An Online Logic Assistant Based on Coq
☆25Feb 15, 2012Updated 14 years ago
valueflows / agent
View on GitHub
agent has moved to https://lab.allmende.io/valueflows/agent
☆10Jun 23, 2020Updated 5 years ago
stardog-union / stardog-spring
View on GitHub
Spring integration with Stardog RDF database
☆18Jan 27, 2025Updated last year
Fakerr / go-paddle
View on GitHub
Go library for accessing the Paddle API
☆10Apr 14, 2022Updated 3 years ago