USCDataScience / sparkler
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
☆411Updated last year
Related projects: ⓘ
- Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.☆446Updated 9 months ago
- A scalable, mature and versatile web crawler based on Apache Storm☆879Updated this week
- Stanford CoreNLP wrapper for Apache Spark☆422Updated 5 years ago
- An Elasticsearch ingest processor to do named entity extraction using Apache OpenNLP☆268Updated last year
- A text tagger based on Lucene / Solr, using FST technology☆173Updated 9 months ago
- Spark RDD with Lucene's query and entity linkage capabilities☆124Updated last week
- Score documents with pure dot product / cosine similarity with ES☆249Updated 3 years ago
- Real Time Analytics and Data Pipelines based on Spark Streaming☆524Updated 4 years ago
- A scalable machine learning library on Apache Spark☆793Updated 3 years ago
- A tool for monitoring and tuning Spark jobs for efficiency.☆357Updated last year
- Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning☆1,786Updated 3 years ago
- Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.☆335Updated 4 years ago
- This project combines Apache Spark and Elasticsearch to enable mining & prediction for Elasticsearch.☆209Updated 9 years ago
- Serverless proxy for Spark cluster☆326Updated 3 years ago
- A set of reusable Java components that implement functionality common to any web crawler☆233Updated last month
- The Apache Gora open source framework provides an in-memory data model and persistence for big data.☆118Updated 6 months ago
- Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.☆381Updated last year
- Elastic Search on Spark☆112Updated 9 years ago
- [DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark☆749Updated last month
- Mirror of Apache Bahir☆337Updated last year
- Docker build for Apache Spark☆676Updated 2 years ago
- Avro Data Source for Apache Spark☆539Updated 5 years ago
- Iceberg is a table format for large, slow-moving tabular data☆476Updated last year
- Elasticsearch Index Termlist☆117Updated 5 years ago
- Code to index HDFS to Solr using MapReduce☆51Updated 5 years ago
- Data Integration Graph☆204Updated 6 years ago
- Livy is an open source REST interface for interacting with Apache Spark from anywhere☆1,008Updated last year
- Spark Knowledge Base☆334Updated 3 years ago
- Sparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit…☆285Updated 6 years ago
- Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in…☆1,039Updated last year