USCDataScience / sparklerLinks
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
☆416Updated 2 years ago
Alternatives and similar repositories for sparkler
Users that are interested in sparkler are comparing it to the libraries listed below
Sorting:
- A scalable, mature and versatile web crawler based on Apache Storm☆914Updated this week
- Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.☆446Updated last year
- Spark RDD with Lucene's query and entity linkage capabilities☆128Updated last week
- HBase as a TinkerPop Graph Database☆258Updated 2 weeks ago
- This project combines Apache Spark and Elasticsearch to enable mining & prediction for Elasticsearch.☆211Updated 10 years ago
- Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs☆313Updated this week
- Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in…☆1,037Updated 2 years ago
- Apache Fluo☆188Updated 3 weeks ago
- Serverless proxy for Spark cluster☆326Updated 4 years ago
- Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.☆341Updated 4 years ago
- Scalable query engine for web scrapping/data mashup/acceptance QA, powered by Apache Spark☆142Updated last month
- Banana for Solr - A Port of Kibana☆671Updated 10 months ago
- Elasticsearch entity resolution plugin based on Duke☆210Updated 5 years ago
- Sparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit…☆282Updated 6 years ago
- Real Time Analytics and Data Pipelines based on Spark Streaming☆526Updated 5 years ago
- High Performance Kafka Connector for Spark Streaming.Supports Multi Topic Fetch, Kafka Security. Reliable offset management in Zookeeper.…☆632Updated 3 years ago
- Support Highcharts in Apache Zeppelin☆81Updated 7 years ago
- ☆204Updated 2 years ago
- Livy is an open source REST interface for interacting with Apache Spark from anywhere☆1,007Updated 2 years ago
- Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.☆382Updated 2 years ago
- Build configuration-driven ETL pipelines on Apache Spark☆159Updated 2 years ago
- The Apache Gora open source framework provides an in-memory data model and persistence for big data.☆121Updated last year
- A tool for monitoring and tuning Spark jobs for efficiency.☆358Updated 2 years ago
- An Elasticsearch ingest processor to do named entity extraction using Apache OpenNLP☆272Updated 2 years ago
- The Apache Spark - Apache HBase Connector is a library to support Spark accessing HBase table as external data source or sink.☆554Updated 4 years ago
- Schedoscope is a scheduling framework for painfree agile development, testing, (re)loading, and monitoring of your datahub, lake, or what…☆96Updated 5 years ago
- The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)☆216Updated 2 years ago
- Mirror of Apache Bahir☆336Updated last year
- Data Integration Graph☆206Updated 6 years ago
- Simplifying robust end-to-end machine learning on Apache Spark.☆472Updated 8 years ago