tmbarchive / docker-ocropus
A small Docker built for the OCRopus OCR system.
☆19Updated 7 years ago
Alternatives and similar repositories for docker-ocropus:
Users that are interested in docker-ocropus are comparing it to the libraries listed below
- Docker container to provide Apache Tika RESTful API☆40Updated 9 years ago
- Part of eMOP: Franken+ tool for creating font training for Tesseract OCR engine from page images.☆24Updated 9 years ago
- Google Refine extension for adding columns (extending data) from DBpedia☆39Updated 11 years ago
- A platform for tools that do stuff with data☆56Updated 6 years ago
- Monitor datasets, gets alerts when something happens☆210Updated 6 years ago
- Ergonomic line-by-line transcription of scanned text.☆51Updated 4 years ago
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Updated 8 years ago
- Presentations, tutorials and data for the OCR workshop at LMU☆17Updated 7 years ago
- A workflow system for Natural Language Processing.☆21Updated 5 years ago
- Next generation OCR engine based on LSTMs.☆52Updated 6 years ago
- Data Pipes for CSV☆117Updated 2 years ago
- Named-Entity Recognition extension for Google Refine / OpenRefine☆72Updated 7 years ago
- [DEPRECATED] Please use https://github.com/frictionlessdata/specs☆17Updated 7 years ago
- gathering point for open source OCR scripts and diffs☆43Updated 10 years ago
- javascript multivariate data visualization☆14Updated 8 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Updated 7 years ago
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated 11 months ago
- ☆13Updated 9 years ago
- Tribe extracts a network from an email mbox and writes it to a graphml file for visualization and analysis.☆79Updated last year
- A set of workflows for corpus building through OCR, post-correction and normalisation☆48Updated 2 years ago
- ☆36Updated last year
- The OpenSextant Gazetteer is a collection of world-wide place name data☆12Updated 7 years ago
- A statistics extension for Google Refine.☆33Updated 13 years ago
- This version of Rhizomer is archived, the current version is linked from:☆14Updated 6 years ago
- ☆24Updated 9 years ago
- The Python port of sucka.☆20Updated 9 years ago
- Simplifying the process of launching an open data repository. [RETIRED]☆20Updated 10 years ago
- ArchiveKit manages data and documents during ETL processes, either on a local file system or on S3.☆15Updated 9 years ago
- "Old SFM" -- manage rules and streams from social data sources, starting with twitter.☆86Updated last year
- Uses NLP methods to parse and classify contracts from The City of New Orleans☆10Updated 9 years ago