tmbarchive / docker-ocropus
A small Docker built for the OCRopus OCR system.
☆19Updated 6 years ago
Related projects ⓘ
Alternatives and complementary repositories for docker-ocropus
- Part of eMOP: Franken+ tool for creating font training for Tesseract OCR engine from page images.☆24Updated 9 years ago
- gathering point for open source OCR scripts and diffs☆43Updated 10 years ago
- Google Refine extension for adding columns (extending data) from DBpedia☆39Updated 11 years ago
- Docker container to provide Apache Tika RESTful API☆40Updated 8 years ago
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Updated 8 years ago
- Convert a corpus of PDF to clean text files on a distributed architecture☆37Updated 8 months ago
- Ergonomic line-by-line transcription of scanned text.☆48Updated 3 years ago
- RDFSpace constructs a vector space from any RDF dataset which can be used for computing similarities between resources in that dataset.☆39Updated 11 years ago
- Presentations, tutorials and data for the OCR workshop at LMU☆17Updated 7 years ago
- ☆36Updated last year
- Named-Entity Recognition extension for Google Refine / OpenRefine☆72Updated 7 years ago
- A workflow system for Natural Language Processing.☆21Updated 5 years ago
- A fast, responsive HTML5 viewer for scanned items, developed for the World Digital Library. A project of the Library of Congress. Note: p…☆22Updated 9 years ago
- Tool for visualizing hOCR output from Tesseract (or other OCR engines that support hOCR).☆23Updated 9 years ago
- All that entity matching, resolution, normalization, enhancement and reconciliation madness, but with a focus on data, not platforms.☆24Updated 2 years ago
- The jQuery virtual stack plugin☆54Updated 6 years ago
- Using social media to steer web archiving and curation.☆15Updated 9 years ago
- A statistics extension for Google Refine.☆33Updated 13 years ago
- Structured Data from PDF image-based files☆87Updated 11 years ago
- search, dedupe, and media ingestion for mediachain☆33Updated 8 years ago
- This version of Rhizomer is archived, the current version is linked from:☆14Updated 6 years ago
- Vizlinc☆14Updated 8 years ago
- KEA 5.0 (keyphrase extraction software), modified to be an XML-RPC service☆42Updated 13 years ago
- Data Pipes for CSV☆117Updated last year
- "Old SFM" -- manage rules and streams from social data sources, starting with twitter.☆87Updated last year
- Deployment of pywb as a CommonCrawl Index Server☆21Updated 7 years ago
- ☆13Updated 8 years ago
- A text analysis interface for the humanities☆27Updated 13 years ago
- A module for Omeka S that provides an API for the Neatline 3 single page application☆13Updated last year