LogicalSpark / docker-tikaserver
Apache Tika Server as a Docker Image
☆171Updated 2 years ago
Alternatives and similar repositories for docker-tikaserver:
Users that are interested in docker-tikaserver are comparing it to the libraries listed below
- Convenience Docker images for Apache Tika Server☆155Updated 2 weeks ago
- Github mirror of "search/highlighter" - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access…☆100Updated last week
- A bundle of useful Elasticsearch plugins☆110Updated 10 months ago
- FacetView is a pure javascript frontend for ElasticSearch.☆291Updated 9 years ago
- Index URLs in Common Crawl☆193Updated 7 years ago
- spaCy REST API, wrapped in a Docker container.☆266Updated 2 years ago
- "Stop worrying about Elasticsearch analyzers", my therapist says☆155Updated 3 years ago
- A plugin for language detection in Elasticsearch using Nakatani Shuyo's language detector☆251Updated 7 years ago
- SOLR bulk indexing utility for the command line.☆45Updated 3 weeks ago
- Elasticsearch/Solr Sandbox for exploring explain information and tweaking☆137Updated 11 months ago
- Entity resolution for Elasticsearch.☆158Updated last month
- A python library detect and extract listing data from HTML page.☆108Updated 7 years ago
- Web based JavaScript GUI library for proofreading/editing hOCR☆93Updated 6 years ago
- Carrot2 plugin for ElasticSearch☆292Updated 2 years ago
- Bulk indexing command line tool for elasticsearch.☆280Updated 3 weeks ago
- A simple viewer and inspection tool for text boxes in PDF documents☆94Updated 2 years ago
- Mapper Attachments Type plugin for Elasticsearch☆504Updated last year
- An Elasticsearch ingest processor to do named entity extraction using Apache OpenNLP☆270Updated 2 years ago
- a pure javascript frontend for ElasticSearch search indices.☆79Updated 6 years ago
- Launch AWS Elastic MapReduce jobs that process Common Crawl data.☆49Updated 8 years ago
- Docker container to provide Apache Tika RESTful API☆40Updated 9 years ago
- Decompounding Plugin for Elasticsearch☆87Updated 3 years ago
- Extract postal addresses from the DOM☆66Updated 12 years ago
- ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (image…☆95Updated 6 years ago
- A text tagger based on Lucene / Solr, using FST technology☆176Updated last year
- Curated synonym files and Helpers for Elasticsearch Synonym Token Filter☆64Updated last year
- Demonstration of using Python to process the Common Crawl dataset with the mrjob framework☆166Updated 2 years ago
- An expandable and scalable OCR pipeline☆87Updated 7 years ago
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆382Updated 6 months ago
- 💫 REST microservices for various spaCy-related tasks☆240Updated 2 years ago