USCDataScience / sparklerLinks
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
☆416Updated 2 years ago
Alternatives and similar repositories for sparkler
Users that are interested in sparkler are comparing it to the libraries listed below
Sorting:
- A scalable, mature and versatile web crawler based on Apache Storm☆921Updated last week
- Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.☆446Updated last year
- A text tagger based on Lucene / Solr, using FST technology☆176Updated last year
- An Elasticsearch ingest processor to do named entity extraction using Apache OpenNLP☆272Updated 2 years ago
- Spark RDD with Lucene's query and entity linkage capabilities☆128Updated last month
- Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.☆381Updated 2 years ago
- Query preprocessor for Java-based search engines (Querqy Core and Solr implementation)☆184Updated last month
- Score documents with pure dot product / cosine similarity with ES☆252Updated 3 years ago
- Dice Solr Plugins from Simon Hughes Dice.com☆87Updated 4 years ago
- HBase as a TinkerPop Graph Database☆259Updated last week
- This project combines Apache Spark and Elasticsearch to enable mining & prediction for Elasticsearch.☆211Updated 10 years ago
- Banana for Solr - A Port of Kibana☆671Updated 11 months ago
- A set of reusable Java components that implement functionality common to any web crawler☆244Updated last week
- Elasticsearch entity resolution plugin based on Duke☆209Updated 5 years ago
- The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)☆216Updated 2 years ago
- Carrot2 plugin for ElasticSearch☆291Updated 2 years ago
- The Apache Gora open source framework provides an in-memory data model and persistence for big data.☆121Updated last year
- Kite SDK☆394Updated 2 years ago
- Divolte Collector☆281Updated 3 years ago
- Data Integration Graph☆206Updated 6 years ago
- ☆75Updated 5 years ago
- Serverless proxy for Spark cluster☆326Updated 4 years ago
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆191Updated this week
- Mirror of Apache Lucene + Solr☆48Updated 5 years ago
- Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.☆128Updated 9 years ago
- Schema Registry☆16Updated last year
- Solr query parser plugin that performs proper query-time synonym expansion.☆150Updated 4 years ago
- Mirror of Apache Atlas (Incubating)☆94Updated 2 years ago
- StreamLine - Streaming Analytics☆164Updated last year
- Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs☆312Updated last week