LogicalSpark / docker-tikaserver
Apache Tika Server as a Docker Image
☆172Updated 2 years ago
Alternatives and similar repositories for docker-tikaserver:
Users that are interested in docker-tikaserver are comparing it to the libraries listed below
- Bulk indexing command line tool for elasticsearch.☆280Updated 2 weeks ago
- FacetView is a pure javascript frontend for ElasticSearch.☆291Updated 9 years ago
- "Stop worrying about Elasticsearch analyzers", my therapist says☆155Updated 3 years ago
- spaCy REST API, wrapped in a Docker container.☆267Updated 2 years ago
- Tesseract 4 OCR Runtime Environment - Docker Container☆99Updated 6 years ago
- Decompounding Plugin for Elasticsearch☆87Updated 4 years ago
- Starter Reverse Proxy Configuration for Solr☆47Updated 9 years ago
- 💫 REST microservices for various spaCy-related tasks☆240Updated 2 years ago
- Automatically exported from code.google.com/p/chromium-compact-language-detector☆161Updated 4 years ago
- ☆184Updated 6 years ago
- Launch AWS Elastic MapReduce jobs that process Common Crawl data.☆49Updated 8 years ago
- A python library detect and extract listing data from HTML page.☆108Updated 7 years ago
- Solrstrap is a Query-Result interface for Solr written in JavaScript, HTML and CSS☆86Updated 7 years ago
- Index URLs in Common Crawl☆194Updated 7 years ago
- Entity resolution for Elasticsearch.☆159Updated 2 months ago
- Convenience Docker images for Apache Tika Server☆169Updated last month
- Entity Extraction Text Processor☆148Updated last year
- a pure javascript frontend for ElasticSearch search indices.☆79Updated 7 years ago
- SOLR bulk indexing utility for the command line.☆45Updated 2 weeks ago
- Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser☆13Updated 3 years ago
- Elasticsearch lemmatizer for 15 languages☆105Updated 3 months ago
- Named-Entity Recognition extension for Google Refine / OpenRefine☆72Updated 7 years ago
- NER toolkit for HTML data☆259Updated 10 months ago
- Github mirror of "search/highlighter" - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access…☆102Updated last month
- Demonstration of using Python to process the Common Crawl dataset with the mrjob framework☆166Updated 2 years ago
- Curated synonym files and Helpers for Elasticsearch Synonym Token Filter☆64Updated last year
- Search Management UI☆54Updated 4 months ago
- ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (image…☆95Updated 6 years ago
- A reference mechanism for including content from other documents during the Elasticsearch analysis field mapping phase☆35Updated 5 years ago
- A platform for backing crowdsourcing websites, built in golang for elasticsearch☆360Updated 4 years ago