tmbarchive / docker-ocropus
A small Docker built for the OCRopus OCR system.
☆19Updated 6 years ago
Related projects: ⓘ
- Docker container to provide Apache Tika RESTful API☆40Updated 8 years ago
- Part of eMOP: Franken+ tool for creating font training for Tesseract OCR engine from page images.☆24Updated 8 years ago
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Updated 7 years ago
- Google Refine extension for adding columns (extending data) from DBpedia☆38Updated 10 years ago
- Ergonomic line-by-line transcription of scanned text.☆47Updated 3 years ago
- A statistics extension for Google Refine.☆33Updated 13 years ago
- A simple PDF transcription project for PyBossa☆19Updated 8 years ago
- A text analysis interface for the humanities☆27Updated 13 years ago
- ArchiveKit manages data and documents during ETL processes, either on a local file system or on S3.☆15Updated 9 years ago
- gathering point for open source OCR scripts and diffs☆43Updated 10 years ago
- Work relating to the OCR wish-list item "figure out an algorithm that would separate images into sets with no handwriting, little handwri…☆20Updated 11 years ago
- This version of Rhizomer is archived, the current version is linked from:☆14Updated 6 years ago
- Structured Data from PDF image-based files☆87Updated 11 years ago
- Tools for working with Optical Character Recognition output☆16Updated 10 years ago
- Serapis is a sentence identifier and modeling pipeline / built for Wordnik☆24Updated 8 years ago
- Experiments mining image collections using OpenCV☆64Updated 9 years ago
- [DEPRECATED] Please use https://github.com/frictionlessdata/specs☆17Updated 6 years ago
- A workflow system for Natural Language Processing.☆21Updated 4 years ago
- A fast, responsive HTML5 viewer for scanned items, developed for the World Digital Library. A project of the Library of Congress. Note: p…☆22Updated 9 years ago
- Data Pipes for CSV☆117Updated last year
- RDFSpace constructs a vector space from any RDF dataset which can be used for computing similarities between resources in that dataset.☆39Updated 10 years ago
- "Old SFM" -- manage rules and streams from social data sources, starting with twitter.☆87Updated last year
- Uses NLP methods to parse and classify contracts from The City of New Orleans☆10Updated 9 years ago
- Tool for visualizing hOCR output from Tesseract (or other OCR engines that support hOCR).☆22Updated 9 years ago
- ☆24Updated 9 years ago
- Training files produced for and by the Tesseract OCR engine for work on the Early Modern OCR Project (eMOP)☆36Updated 8 years ago
- All that entity matching, resolution, normalization, enhancement and reconciliation madness, but with a focus on data, not platforms.☆23Updated 2 years ago
- ☆12Updated this week
- Tools to analyze web archives☆20Updated 8 years ago
- See https://github.com/tworavens/tworavens for current repository for this project and http://2ra.vn for project pages.☆30Updated 6 years ago