chrismattmann / imagecat
ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (images,but could be extended to other files) in place, and to extract metadata and OCR information from those files/images using Tika and Tesseract OCR.
☆95Updated 6 years ago
Alternatives and similar repositories for imagecat:
Users that are interested in imagecat are comparing it to the libraries listed below
- Interactive Image similarity and Visual Search and Retrieval application☆96Updated 11 months ago
- Topic modeling web application☆40Updated 9 years ago
- Faceted search engine for domain-specific exploration of the Web☆45Updated 8 years ago
- Viewers for statistics and dashboarding of Domain Search Engine data☆123Updated 9 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆16Updated 9 years ago
- Browser add-on and web server to support collection and analysis of web browsing data.☆13Updated 9 years ago
- General Architecture for Text Engineering☆49Updated 9 years ago
- ☆43Updated 9 years ago
- Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.☆34Updated last year
- Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.☆108Updated this week
- Facet Search interface for MEMEX.☆13Updated 10 years ago
- ☆20Updated 7 years ago
- Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser☆13Updated 3 years ago
- MITIE: library and tools for information extraction☆29Updated 10 years ago
- Aperture-Tiles uses familiar web-based map interactions to allow exploration of arbitrary huge data sets.☆74Updated last year
- Quickly analyze and explore email with advanced analytics and visualization.☆56Updated 3 years ago
- Uses Apache Lucene, OpenNLP and geonames and extracts locations from text and geocodes them.☆37Updated last year
- Tribe extracts a network from an email mbox and writes it to a graphml file for visualization and analysis.☆79Updated last year
- Index URLs in Common Crawl☆194Updated 7 years ago
- ☆28Updated 8 years ago
- Simple search results with Solr and EmberJS☆58Updated 6 years ago
- A toolkit for clustering web pages based on various similarity measures.☆33Updated 3 years ago
- Nutch-Python is a Python binding to the Apache Nutch™ REST services allowing Nutch to be called natively in the Python community. — Edit☆39Updated 9 years ago
- The WikiBrain Java library enables researchers and developers to incorporate state-of-the-art Wikipedia-based algorithms and technologies…☆91Updated 6 years ago
- Seed acquisition tool to bootstrap focused crawlers☆23Updated 7 years ago
- Python toolkit for pluggable algorithms and data structures for multimedia-based machine learning.☆78Updated last year
- Human-Powered Data Analysis with Mechanical Turk☆300Updated 12 years ago
- FacetView is a pure javascript frontend for ElasticSearch.☆290Updated 9 years ago
- Solr Dictionary Annotator (Microservice for Spark)☆71Updated 5 years ago
- Simple RESTful API server running your own machine translation model. Docker image modified from mbartoli/easy-smt☆11Updated 5 years ago