USCDataScience / tika-dockersLinks
A suite of Machine Learning / Deep Learning Dockerfiles to allow Apache Tika to extract objects and to produce textual captions for images and video
☆21Updated last year
Alternatives and similar repositories for tika-dockers
Users that are interested in tika-dockers are comparing it to the libraries listed below
Sorting:
- Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser☆13Updated 3 years ago
- Demonstration of searching PDF document with Solr, Tika, and Tesseract☆31Updated 9 months ago
- Advanced desktop search/corpus exploration prototype☆21Updated 4 years ago
- Efficient indexing and retrieval of OCR bounding boxes in Solr☆22Updated 6 years ago
- Apache Tika Server as a Docker Image☆172Updated 3 years ago
- Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.☆34Updated 2 years ago
- Federated Knowledge Extraction Framework☆192Updated last year
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆269Updated 2 years ago
- Highlighting various OCR formats directly in Solr☆86Updated this week
- Solr client and user interface for search☆22Updated last year
- Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri☆47Updated 3 years ago
- View HOCR files with Mirador☆29Updated 7 years ago
- Image recognition on Spark cluster powered by Deeplearning4j and Apache Tika☆14Updated 8 years ago
- a pure javascript frontend for ElasticSearch search indices.☆80Updated 7 years ago
- Highly performant, lightweight framework for linked data processing. Supports RDFa, JSON-LD, RDF/XML and plain text formats, runs on Andr…☆52Updated 2 years ago
- 🦜 Containerized HTTP API for industrial-strength NLP via spaCy and sense2vec☆60Updated 3 years ago
- A small tool which uses the CommonCrawl URL Index to download documents with certain file types or mime-types. This is used for mass-test…☆68Updated 3 weeks ago
- EEA ElasticSearch RDF River Plugin☆64Updated 3 years ago
- Apache NiFi Custom Processor Extracting Text From Files with Apache Tika☆35Updated last year
- Uses Apache Lucene, OpenNLP and geonames and extracts locations from text and geocodes them.☆38Updated last year
- OAI-PMH plugin for Solr☆23Updated 4 years ago
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆191Updated 2 months ago
- Ingest processor doing language detection for fields☆72Updated 2 years ago
- Entity resolution for Elasticsearch.☆160Updated 6 months ago
- CM-Well - a data warehouse for your knowledge graph☆180Updated 2 years ago
- A set of workflows for corpus building through OCR, post-correction and normalisation☆50Updated 2 years ago
- An expandable and scalable OCR pipeline☆87Updated 7 years ago
- Update of the ISRI Analytic Tools for OCR Evaluation with UTF-8 support☆57Updated 4 years ago
- An RDF plugin for Solr☆115Updated 5 months ago
- LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilatio…☆68Updated last year