USCDataScience / tika-dockers
A suite of Machine Learning / Deep Learning Dockerfiles to allow Apache Tika to extract objects and to produce textual captions for images and video
☆21Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for tika-dockers
- Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser☆13Updated 2 years ago
- Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.☆32Updated last year
- Advanced desktop search/corpus exploration prototype☆21Updated 3 years ago
- Homebase of the IPTC EXTRA project about rule-based text categorization☆13Updated 7 years ago
- Extract statistics from Wikipedia Dump files.☆26Updated 3 years ago
- Highly performant, lightweight framework for linked data processing. Supports RDFa, JSON-LD, RDF/XML and plain text formats, runs on Andr…☆51Updated 2 years ago
- Apache OpenNLP Sandbox☆42Updated this week
- Java parsers for different RDF serialisations + API + tools + JAX-RS integration☆20Updated 3 years ago
- This is the ETL lib package. It provides an API to munge and prepare JSON, TSV and other data using Apache Tika and JSON parsing/loading …☆16Updated 9 months ago
- a pure javascript frontend for ElasticSearch search indices.☆79Updated 6 years ago
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better unders…☆46Updated 2 years ago
- Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction from unstructured data and GIS outputters.☆44Updated last month
- Quickly analyze and explore email with advanced analytics and visualization.☆55Updated 3 years ago
- CoronaWhy Common Research and Data Infrastructure for COVID-19☆13Updated 3 years ago
- ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (image…☆94Updated 6 years ago
- This is the facade for installation and access to the individual components☆16Updated 6 years ago
- A small tool which uses the CommonCrawl URL Index to download documents with certain file types or mime-types. This is used for mass-test…☆63Updated last week
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆14Updated 9 years ago
- Image recognition on Spark cluster powered by Deeplearning4j and Apache Tika☆14Updated 7 years ago
- Text similarity based on Word2Vec vectors.☆10Updated 7 years ago
- Open Semantic Visual Linked Data Graph Explorer: Open Source tool (web app) and user interace (UI) for discovery, exploration and visuali…☆78Updated 4 years ago
- Python bindings for Neo4j☆26Updated 10 years ago
- Demonstration of searching PDF document with Solr, Tika, and Tesseract☆30Updated last month
- Meta-repository for the open-source version of the SUMMA Platform☆16Updated 7 months ago
- Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri☆46Updated 2 years ago
- A set of workflows for corpus building through OCR, post-correction and normalisation☆48Updated 2 years ago
- PST extraction and analytic pipeline☆37Updated 6 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆24Updated 7 years ago
- Simplified version of a common crawl fetcher☆13Updated this week
- A toolkit for clustering web pages based on various similarity measures.☆32Updated 3 years ago