UW-xDD / blackstack
Entity extraction from PDFs with Tesseract and Machine Learning
☆11Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for blackstack
- A set of workflows for corpus building through OCR, post-correction and normalisation☆48Updated 2 years ago
- Ergonomic line-by-line transcription of scanned text.☆47Updated 3 years ago
- Named Entity Recognition data for Europeana Newspapers☆173Updated last year
- Temporal Expression Recognition and Normalisation in Python☆78Updated 8 years ago
- A way to do annotations for NER. TALEN: Tool for Annotation of Low-resource ENtities☆112Updated 2 years ago
- Named-Entity Recognition extension for Google Refine / OpenRefine☆72Updated 7 years ago
- Entity Extraction Text Processor☆148Updated last year
- An intelligent reading agent that understands text and translates it into Wikidata statements.☆112Updated 8 years ago
- 💫 Scripts, tools and resources for developing spaCy☆125Updated 5 years ago
- For extracting measurements and related entities from text☆56Updated 4 years ago
- Command line tool to extract figures, tables, and captions from scholarly documents in PDF form.☆130Updated 6 years ago
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Updated 8 years ago
- Stanford Pattern-based Information Extraction and Diagnostics -- Visualization☆94Updated 10 years ago
- Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction from unstructured data and GIS outputters.☆44Updated 3 weeks ago
- Quill's library of open source NLP algorithms and data sets.☆51Updated 7 months ago
- 'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy.☆22Updated 6 years ago
- IXA pipes Named Entity Tagger (http://ixa2.si.ehu.es/ixa-pipes).☆31Updated 5 years ago
- A system for connecting language to space and time.☆64Updated 4 years ago
- Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.☆62Updated 7 years ago
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆261Updated 2 years ago
- ☆25Updated 5 years ago
- displaCy-ent.js: An open-source named entity visualiser for the modern web☆198Updated 6 years ago
- Events and Situations Ontology☆13Updated 6 years ago
- ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (image…☆94Updated 6 years ago
- Practical Natural Language Processing Tools for Humans. Dependency Parsing, Syntactic Constituent Parsing, Semantic Role Labeling, Named …☆192Updated 7 years ago
- Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri☆46Updated 2 years ago
- A simple viewer and inspection tool for text boxes in PDF documents☆92Updated 2 years ago
- The CIS OCR PostCorrectionTool☆40Updated 2 years ago
- This is a REST Server endpoint built using Flask and Python.☆24Updated last year