UW-xDD / blackstackLinks
Entity extraction from PDFs with Tesseract and Machine Learning
☆11Updated 4 years ago
Alternatives and similar repositories for blackstack
Users that are interested in blackstack are comparing it to the libraries listed below
Sorting:
- For extracting measurements and related entities from text☆58Updated 5 years ago
- A toolkit for clustering web pages based on various similarity measures.☆33Updated 3 years ago
- Convert a corpus of PDF to clean text files on a distributed architecture☆39Updated last year
- An intelligent reading agent that understands text and translates it into Wikidata statements.☆116Updated 8 years ago
- Named Entity Recognition data for Europeana Newspapers☆171Updated 2 years ago
- A lightweight server to allow HTTP requests to the Stanford Named Entity Recognized and a heavily modified CLAVIN geoparser.☆119Updated 3 years ago
- A set of workflows for corpus building through OCR, post-correction and normalisation☆49Updated 2 years ago
- Quick and easy geographical functions in Python☆41Updated 3 years ago
- A simple viewer and inspection tool for text boxes in PDF documents☆95Updated 3 years ago
- Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri☆47Updated 3 years ago
- Named-Entity Recognition extension for Google Refine / OpenRefine☆72Updated 8 years ago
- Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction from unstructured data and GIS outputters.☆44Updated this week
- An expandable and scalable OCR pipeline☆87Updated 7 years ago
- Next generation OCR engine based on LSTMs.☆52Updated 7 years ago
- Python script for matching a list of messy addresses against a gazetteer using dedupe.☆63Updated 5 years ago
- Data Server for Topic Models☆121Updated 2 years ago
- Ergonomic line-by-line transcription of scanned text.☆52Updated 4 years ago
- 💫 Scripts, tools and resources for developing spaCy☆126Updated 6 years ago
- pyaddress is an address parsing library, taking the guesswork out of using addresses in your applications. We use it as part of our apart…☆100Updated 5 years ago
- ☆25Updated 6 years ago
- IXA pipes Named Entity Tagger (http://ixa2.si.ehu.es/ixa-pipes).☆32Updated 6 years ago
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Updated 8 years ago
- Create a Geonames gazetteer index in Elasticsearch☆77Updated last year
- Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.☆62Updated 8 years ago
- ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (image…☆95Updated 6 years ago
- Stanford Pattern-based Information Extraction and Diagnostics -- Visualization☆93Updated 10 years ago
- Another next-generation event coding platform.☆76Updated 6 years ago
- Events and Situations Ontology☆14Updated 7 years ago
- Earth Science Knowledge Graph - An Automatic Approach to Building Earth Science Knowledge Graph to Improve Data Discovery☆20Updated 3 years ago
- Harvard WorldMap is a heavily modified fork of GeoNode 1.4 which has recently been migrated to GeoNode 2.10. WorldMap is being made avail…☆95Updated 5 years ago