EntilZha / spark-s3
spark package to natively read S3 files instead of through hadoop, improving speed
☆11Updated 8 years ago
Alternatives and similar repositories for spark-s3:
Users that are interested in spark-s3 are comparing it to the libraries listed below
- A framework for creating composable and pluggable data processing pipelines using Apache Spark, and running them on a cluster.☆47Updated 8 years ago
- functionstest☆33Updated 8 years ago
- Scala stuff☆18Updated 5 years ago
- Library for organizing batch processing pipelines in Apache Spark☆41Updated 8 years ago
- Provides a SQL interface to your TinkerPop enabled graph db☆74Updated last year
- An example of using Avro and Parquet in Spark SQL☆60Updated 9 years ago
- Simple Spark app that reads and writes Avro data☆31Updated 9 years ago
- Use Cascading Taps and Scalding DSL with Spark☆49Updated 8 years ago
- Project for the talk on NLP using LSTM implementation from DL4J on Spark☆20Updated 8 years ago
- Topic Modeling with LDA in Scala and Spark☆31Updated 6 years ago
- something to help you spark☆65Updated 6 years ago
- Deriving Spark DataFrame schemas from case classes☆44Updated 7 months ago
- Utilities for Apache Spark☆34Updated 8 years ago
- Cascading on Apache Flink®☆54Updated 11 months ago
- Spark RDD with Lucene's query and entity linkage capabilities☆124Updated this week
- An umbrella project for multiple implementations of model serving☆46Updated 7 years ago
- MLeap allows for easily putting Spark ML pipelines into production☆78Updated 8 years ago
- Memory consumption estimator for Scala/Java☆26Updated 10 years ago
- Machine Learning Pipeline Stages for Spark (exposed in Scala/Java + Python)☆74Updated last year
- Factorization Machines on Spark and Glint☆25Updated 8 years ago
- Spark Connector for Hazelcast☆22Updated 3 years ago
- ☆22Updated 10 years ago
- Example projects for using Spark and Cassandra With DSE Analytics☆58Updated last year
- Scripts for parsing / making sense of yarn logs☆52Updated 8 years ago
- Joins for skewed datasets in Spark☆57Updated 7 years ago
- Schema Registry integration for Apache Spark☆40Updated 2 years ago
- Examples for Fast Data Processing with Spark☆59Updated 11 years ago
- Scriptable scheduler for periodical Hadoop workflows☆22Updated 6 years ago
- Starter project for building MemSQL Streamliner Pipelines☆32Updated 7 years ago
- Docker image for apache zeppelin☆38Updated 7 years ago