Scripts used to setup a Spark cluster on EC2
☆387Nov 22, 2017Updated 8 years ago
Alternatives and similar repositories for spark-ec2
Users that are interested in spark-ec2 are comparing it to the libraries listed below
Sorting:
- A command-line tool for launching Apache Spark clusters.☆651Dec 13, 2024Updated last year
- Mirror of Apache Toree (Incubating)☆749Updated this week
- [NOTE: Repository has moved to github.com/amplab/spark-ec2]☆57Aug 10, 2015Updated 10 years ago
- GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs☆1,136Updated this week
- Interactive and Reactive Data Science using Scala and Spark.☆3,150May 16, 2023Updated 2 years ago
- This repository hold the Amazon Elastic MapReduce sample bootstrap actions☆613Jun 5, 2023Updated 2 years ago
- Docker build for Apache Spark☆671Dec 30, 2021Updated 4 years ago
- Distributed Neural Networks for Spark☆611Jul 23, 2020Updated 5 years ago
- Arteria is a high performance message channel system for IPC and network communication☆12Jun 21, 2017Updated 8 years ago
- REST job server for Apache Spark☆2,843Jul 8, 2025Updated 7 months ago
- Sparkling Water provides H2O functionality inside Spark cluster☆977Nov 5, 2025Updated 4 months ago
- Base classes to use when writing tests with Spark☆1,549Dec 22, 2025Updated 2 months ago
- Spark 2.0 Scala Machine Learning examples☆78Oct 4, 2019Updated 6 years ago
- Simplifying robust end-to-end machine learning on Apache Spark.☆475Apr 18, 2017Updated 8 years ago
- Distributed Deep Learning on Spark☆403Oct 8, 2016Updated 9 years ago
- Livy is an open source REST interface for interacting with Apache Spark from anywhere☆1,007Oct 5, 2022Updated 3 years ago
- ☆525Updated this week
- Dependency and data pipeline management framework for Spark and Scala☆15Apr 8, 2017Updated 8 years ago
- Makes a bunch of EC2 spot priced instances and starts dask running on them.☆13Jun 18, 2018Updated 7 years ago
- Deploy Spark cluster in an easy way.☆75Sep 13, 2016Updated 9 years ago
- TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.☆3,859Jul 10, 2023Updated 2 years ago
- Examples for Fast Data Processing with Spark☆59Sep 10, 2013Updated 12 years ago
- Benchmarks of the H2O Ensemble R interface (H2O 2.0).☆14Nov 4, 2020Updated 5 years ago
- Visualize statistics from the MOOC "Functional Programming Principles in Scala" using Scala!☆202Mar 31, 2014Updated 11 years ago
- tutorials and samples that show you how get the most out of IBM Analytics for Apache Spark☆78Mar 16, 2018Updated 7 years ago
- Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks☆1,666Mar 16, 2024Updated last year
- Benchmark Suite for Apache Spark☆240Apr 12, 2023Updated 2 years ago
- PySpark + Scikit-learn = Sparkit-learn☆1,151Dec 31, 2020Updated 5 years ago
- ☆762Mar 11, 2021Updated 4 years ago
- Hadoop output committers for S3☆113Jul 9, 2020Updated 5 years ago
- A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support☆261Nov 3, 2017Updated 8 years ago
- Functional, Typesafe, Declarative Data Pipelines☆140Jan 29, 2018Updated 8 years ago
- Code to munge data between Kaggle .tsv Rotten Tomatoes Sentiment Analysis data set and Vowpal Wabbit☆24Jun 22, 2014Updated 11 years ago
- CLI tool to launch Spark jobs on AWS EMR☆67Oct 18, 2023Updated 2 years ago
- DynamoDB data source for Apache Spark☆95Sep 2, 2021Updated 4 years ago
- A library for time series analysis on Apache Spark☆1,196Oct 13, 2020Updated 5 years ago
- A free tutorial for Apache Spark.☆992Jan 5, 2026Updated 2 months ago
- Spark ML Lib serving library☆48May 29, 2018Updated 7 years ago
- An example of running Apache Spark using Scala in ipython notebook☆140Aug 31, 2015Updated 10 years ago