momer / nutch-selenium
☆28Updated 8 years ago
Alternatives and similar repositories for nutch-selenium:
Users that are interested in nutch-selenium are comparing it to the libraries listed below
- A Nutch 2.2.1 plugin which allows users to shuffle off the responsibility for retrieving pages to a selenium hub/node spoke system. This …☆16Updated 8 years ago
- Bixo is an open source web mining toolkit that runs as a series of Cascading pipes on top of Hadoop. By building a customized Cascading p…☆142Updated 2 years ago
- Additional opennlp mapping type for elasticsearch in order to perform named entity recognition☆136Updated 8 years ago
- How to spot first stories on Twitter using Storm.☆125Updated last year
- Storm / Solr Integration☆19Updated 11 months ago
- Fusion demo app searching open-source project data from the Apache Software Foundation☆42Updated 6 years ago
- The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)☆212Updated 2 years ago
- A Query Autofiltering SearchComponent for Solr that can translate free-text queries into structured queries using index metadata☆28Updated 6 years ago
- faceted search engine☆41Updated 9 years ago
- Using latent Dirichlet allocation (LDA) in Apache Lucene☆58Updated 12 years ago
- Muppet☆126Updated 3 years ago
- The next generation of open source search☆91Updated 7 years ago
- ☆65Updated 8 years ago
- Apache OpenNLP Sandbox☆42Updated this week
- distributed realtime searchable database☆116Updated 10 years ago
- The first Open Source document analysis platform☆65Updated 3 years ago
- Set of real time stream processing algorithms that can be used by big data streaming platform☆72Updated 4 years ago
- A Text Classification API in Java originally developed by DigitalPebble Ltd. The API is independent from the ML implementations used and …☆48Updated 3 years ago
- Code to index HDFS to Solr using MapReduce☆51Updated 6 years ago
- Fabric-based framework for deploying and managing SolrCloud clusters in the cloud.☆90Updated 5 years ago
- Katta - distributed Lucene☆60Updated 11 years ago
- Java implementation of the TextRank algorithm by Mihalcea, et al.☆75Updated 3 years ago
- Allows a Storm topology to consume an AMQP exchange as an input source.☆59Updated 11 years ago
- Uses Apache Lucene, OpenNLP and geonames and extracts locations from text and geocodes them.☆36Updated 9 months ago