USCDataScience / tika-dockersLinks
A suite of Machine Learning / Deep Learning Dockerfiles to allow Apache Tika to extract objects and to produce textual captions for images and video
☆21Updated last year
Alternatives and similar repositories for tika-dockers
Users that are interested in tika-dockers are comparing it to the libraries listed below
Sorting:
- Demonstration of searching PDF document with Solr, Tika, and Tesseract☆32Updated last year
- Advanced desktop search/corpus exploration prototype☆21Updated 4 years ago
- Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.☆34Updated 2 years ago
- Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser☆13Updated 3 years ago
- Solrstrap is a Query-Result interface for Solr written in JavaScript, HTML and CSS☆87Updated 8 years ago
- 🦜 Containerized HTTP API for industrial-strength NLP via spaCy and sense2vec☆59Updated 4 years ago
- a pure javascript frontend for ElasticSearch search indices.☆80Updated 7 years ago
- Solr AutoComplete implementation☆59Updated 8 years ago
- Apache Tika Server as a Docker Image☆174Updated 3 years ago
- A set of workflows for corpus building through OCR, post-correction and normalisation☆49Updated 3 years ago
- NLP framework: sentence detector, tokeniser, pos-tagger and dependency parser☆51Updated 8 months ago
- Federated Knowledge Extraction Framework☆193Updated 2 years ago
- Solr client and user interface for search☆22Updated last year
- A small tool which uses the CommonCrawl URL Index to download documents with certain file types or mime-types. This is used for mass-test…☆73Updated 3 weeks ago
- Trying to generate name synonyms from wikidata☆35Updated 5 years ago
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆277Updated 3 years ago
- Please note that the warc-indexer tool & code is now supported by NetArchiveSuite. The 'warc-indexer' directory and code that exists in t…☆132Updated 2 months ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Updated 8 years ago
- This is a REST Server endpoint built using Flask and Python.☆24Updated 3 years ago
- Homebase of the IPTC EXTRA project about rule-based text categorization☆13Updated 8 years ago
- Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri☆47Updated 4 years ago
- spaCy REST API, wrapped in a Docker container.☆268Updated 3 years ago
- ☆185Updated 7 years ago
- SOLR bulk indexing utility for the command line.☆45Updated 2 months ago
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆52Updated 5 years ago
- An RDF plugin for Solr☆114Updated last year
- Simple RESTful API server running your own machine translation model. Docker image modified from mbartoli/easy-smt☆11Updated 6 years ago
- Now included in rigour☆152Updated 2 months ago
- Entity resolution for Elasticsearch.☆166Updated last month
- Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your com…☆135Updated 3 months ago