USCDataScience / sparkler
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
☆415Updated 2 years ago
Alternatives and similar repositories for sparkler
Users that are interested in sparkler are comparing it to the libraries listed below
Sorting:
- A scalable, mature and versatile web crawler based on Apache Storm☆907Updated this week
- Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.☆446Updated last year
- Kite SDK☆393Updated 2 years ago
- The Apache Gora open source framework provides an in-memory data model and persistence for big data.☆121Updated last year
- Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.☆382Updated 2 years ago
- Elasticsearch entity resolution plugin based on Duke☆210Updated 4 years ago
- A set of reusable Java components that implement functionality common to any web crawler☆244Updated 3 weeks ago
- Real Time Analytics and Data Pipelines based on Spark Streaming☆526Updated 5 years ago
- Build configuration-driven ETL pipelines on Apache Spark☆159Updated 2 years ago
- Scalable query engine for web scrapping/data mashup/acceptance QA, powered by Apache Spark☆142Updated 2 weeks ago
- Score documents with pure dot product / cosine similarity with ES☆251Updated 3 years ago
- This code base is retained for historical interest only, please visit Apache Incubator Repo for latest one☆560Updated 2 years ago
- Code to index HDFS to Solr using MapReduce☆52Updated 6 years ago
- Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in…☆1,037Updated 2 years ago
- Divolte Collector☆281Updated 3 years ago
- KillrWeather is a reference application (work in progress) showing how to easily integrate streaming and batch data processing with Apach…☆1,182Updated 8 years ago
- Spark RDD with Lucene's query and entity linkage capabilities☆127Updated last month
- Sparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit…☆282Updated 6 years ago
- Livy is an open source REST interface for interacting with Apache Spark from anywhere☆1,007Updated 2 years ago
- spark + drools☆102Updated 2 years ago
- Scripts for generating Grafana dashboards for monitoring Spark jobs☆242Updated 10 years ago
- Apache Fluo☆188Updated last week
- This project combines Apache Spark and Elasticsearch to enable mining & prediction for Elasticsearch.☆211Updated 10 years ago
- A text tagger based on Lucene / Solr, using FST technology☆176Updated last year
- Continuous scalable web crawler built on top of Flink and crawler-commons☆52Updated 6 years ago
- Mirror of Apache Hivemall (incubating)☆312Updated 2 years ago
- HBase as a TinkerPop Graph Database☆257Updated last month
- Banana for Solr - A Port of Kibana☆670Updated 9 months ago
- A scrapy pipeline which send items to Elastic Search server☆328Updated 2 years ago
- [PROJECT IS NO LONGER MAINTAINED] Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streamin…☆725Updated 3 years ago