artunit / ossocr
gathering point for open source OCR scripts and diffs
☆43Updated 10 years ago
Alternatives and similar repositories for ossocr:
Users that are interested in ossocr are comparing it to the libraries listed below
- A small Docker built for the OCRopus OCR system.☆20Updated 7 years ago
- Named-Entity Recognition extension for Google Refine / OpenRefine☆72Updated 7 years ago
- Experiments mining image collections using OpenCV☆64Updated 9 years ago
- ☆16Updated 10 years ago
- Part of eMOP: Franken+ tool for creating font training for Tesseract OCR engine from page images.☆24Updated 9 years ago
- An extension to Google Refine that enables graphical mapping of Google Refine project data to an RDF skeleton and then exporting it in RD…☆94Updated last year
- Training files produced for and by the Tesseract OCR engine for work on the Early Modern OCR Project (eMOP)☆36Updated 9 years ago
- A backend store for the Annotator☆179Updated 9 years ago
- Exploring extracting tables from a PDF to CSV using PDF.JS☆103Updated 8 years ago
- This repository is DEPRECATED please goto:☆18Updated 8 years ago
- This repository is community oriented wiki and issue tracker without any code. It is the community documentation and communication channe…☆22Updated 6 years ago
- SKOS Support for Apache Lucene and Solr☆56Updated 3 years ago
- RDFSpace constructs a vector space from any RDF dataset which can be used for computing similarities between resources in that dataset.☆39Updated 11 years ago
- Efficient indexing and retrieval of OCR bounding boxes in Solr☆22Updated 6 years ago
- A set of workflows for corpus building through OCR, post-correction and normalisation☆48Updated 2 years ago
- A MongoDB implementation of the W3C Web Annotation Protocol☆17Updated 2 years ago
- See https://github.com/tworavens/tworavens for current repository for this project and http://2ra.vn for project pages.☆30Updated 6 years ago
- Semiautomatic annotation editor for rich html editors.☆60Updated 11 years ago
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Updated 8 years ago
- Dockerized version of Google's SyntaxNet Parser and POS tagger.☆42Updated 6 years ago
- Javascript implementation of the W3C Web Annotation Data Model, useful for Web Extensions and serializing references to specific resource…☆24Updated 6 years ago
- Large RDF hierarchies as vector spaces☆20Updated 10 years ago
- This repository contains tool and collections dataset for detecting off-topic pages from Web archived collections.☆18Updated 9 years ago
- Topic Modeling Workflow in Python☆16Updated 2 years ago
- Text-Induced Corpus Clean-up☆20Updated last year
- KEA 5.0 (keyphrase extraction software), modified to be an XML-RPC service☆42Updated 13 years ago
- Docker container to provide Apache Tika RESTful API☆41Updated 9 years ago
- Specification of NAF, the NLP annotation format☆21Updated 4 years ago
- All that entity matching, resolution, normalization, enhancement and reconciliation madness, but with a focus on data, not platforms.☆24Updated 3 years ago
- ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (image…☆95Updated 6 years ago