UW-xDD / blackstack
Entity extraction from PDFs with Tesseract and Machine Learning
☆11Updated 3 years ago
Alternatives and similar repositories for blackstack:
Users that are interested in blackstack are comparing it to the libraries listed below
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated last year
- A set of workflows for corpus building through OCR, post-correction and normalisation☆48Updated 2 years ago
- 'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy.☆22Updated 7 years ago
- Named-Entity Recognition extension for Google Refine / OpenRefine☆72Updated 7 years ago
- Named Entity Recognition data for Europeana Newspapers☆171Updated last year
- Ergonomic line-by-line transcription of scanned text.☆51Updated 4 years ago
- For extracting measurements and related entities from text☆57Updated 4 years ago
- spaCy-to-naf converter☆21Updated 9 months ago
- ☆25Updated 5 years ago
- Scripts and microservice to feed an ElasticSearch with Wikidata and Inventaire entities, and keep those up-to-date☆41Updated 4 years ago
- Functional and structural analysis of tables in research papers (Table disentangling)☆20Updated 7 years ago
- Another next-generation event coding platform.☆73Updated 5 years ago
- Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction from unstructured data and GIS outputters.☆44Updated last month
- OCR evaluation brought to you by University of Alicante☆67Updated 2 years ago
- An intelligent reading agent that understands text and translates it into Wikidata statements.☆114Updated 8 years ago
- The accompanying code and data for the Springer 2017 publication "What's missing in geographical parsing?" in Language Resources and Eval…☆17Updated 5 years ago
- IXA pipes Named Entity Tagger (http://ixa2.si.ehu.es/ixa-pipes).☆32Updated 5 years ago
- Locate and extract tables and figures in PDFs☆41Updated 3 years ago
- Model Training tool for MITIE☆79Updated 9 years ago
- Anafora is a web-based raw text annotation tool☆241Updated 2 years ago
- Recognition Models for Kraken and CLSTM☆14Updated 5 years ago
- A way to do annotations for NER. TALEN: Tool for Annotation of Low-resource ENtities☆114Updated 2 years ago
- High-level build project for all LAPDF-Text submodules☆103Updated 9 years ago
- An expandable and scalable OCR pipeline☆87Updated 7 years ago
- Next generation OCR engine based on LSTMs.☆52Updated 6 years ago
- 🚀GUI for training spaCy models☆54Updated 3 years ago
- A collection of simple tutorials for using Fonduer☆99Updated 4 years ago
- Events and Situations Ontology☆14Updated 6 years ago
- Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic pr…☆67Updated last month
- Version 1.0 of the CrowdTruth Framework for crowdsourcing ground truth data, for training and evaluation of cognitive computing systems. …☆60Updated 6 years ago