UW-xDD / blackstackLinks
Entity extraction from PDFs with Tesseract and Machine Learning
☆10Updated 4 years ago
Alternatives and similar repositories for blackstack
Users that are interested in blackstack are comparing it to the libraries listed below
Sorting:
- Ergonomic line-by-line transcription of scanned text.☆54Updated this week
- 'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy.☆22Updated 7 years ago
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated last year
- Named-Entity Recognition extension for Google Refine / OpenRefine☆73Updated 8 years ago
- Ocular is a state-of-the-art historical OCR system.☆266Updated last year
- An intelligent reading agent that understands text and translates it into Wikidata statements.☆116Updated 9 years ago
- A set of workflows for corpus building through OCR, post-correction and normalisation☆49Updated 3 years ago
- Create a Geonames gazetteer index in Elasticsearch☆79Updated 2 years ago
- ☆25Updated 6 years ago
- A simple viewer and inspection tool for text boxes in PDF documents☆96Updated 3 years ago
- An expandable and scalable OCR pipeline☆89Updated 8 years ago
- Palladio Application☆43Updated 4 years ago
- Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction from unstructured data and GIS outputters.☆46Updated last week
- Entity Extraction Text Processor☆149Updated 2 years ago
- PYBOSSA is the ultimate crowdsourcing framework (aka microtasking) to analyze or enrich data that can't be processed by machines alone.☆762Updated last year
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆277Updated 3 years ago
- ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (image…☆95Updated 7 years ago
- Next generation OCR engine based on LSTMs.☆52Updated 7 years ago
- Named Entity Recognition data for Europeana Newspapers☆173Updated 2 years ago
- FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…☆113Updated last year
- For extracting measurements and related entities from text☆58Updated 5 years ago
- OCR evaluation brought to you by University of Alicante☆67Updated 3 years ago
- Events and Situations Ontology☆14Updated 7 years ago
- The CIS OCR PostCorrectionTool☆44Updated 3 years ago
- Presentations, tutorials and data for the OCR workshop at LMU☆16Updated 8 years ago
- Data Server for Topic Models☆122Updated 2 years ago
- Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.☆62Updated 9 years ago
- Temporal Expression Recognition and Normalisation in Python☆77Updated 10 years ago
- The hOCR Embedded OCR Workflow and Output Format☆75Updated last year
- Recognition Models for Kraken and CLSTM☆16Updated 6 years ago