brunocfnba / docker-spark-cluster
Set up a 3 node spark cluster using docker containers
☆33Updated 6 years ago
Alternatives and similar repositories for docker-spark-cluster:
Users that are interested in docker-spark-cluster are comparing it to the libraries listed below
- Repo for all my code on the articles I post on medium☆108Updated 2 years ago
- An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR☆174Updated last year
- Use Airflow to move data from multiple MySQL databases to BigQuery☆100Updated 4 years ago
- spark on kubernetes☆105Updated 2 years ago
- How to manage Slowly Changing Dimensions with Apache Hive☆55Updated 5 years ago
- Quickstart PySpark with Anaconda on AWS/EMR using Terraform☆47Updated last month
- pyspark-cassandra is a Python port of the awesome @datastax Spark Cassandra connector. Compatible w/ Spark 2.0, 2.1, 2.2, 2.3 and 2.4☆69Updated 4 months ago
- Docker image to submit Spark applications☆38Updated 7 years ago
- A boilerplate for writing PySpark Jobs☆397Updated last year
- Spark on Kubernetes using Helm☆34Updated 4 years ago
- Example unit tests for Apache Spark Python scripts using the py.test framework☆84Updated 8 years ago
- Spark and Hive docker containers sharing a common MySQL metastore☆26Updated 4 years ago
- Docker container for Kafka - Spark Streaming - Cassandra☆97Updated 5 years ago
- Data validation library for PySpark 3.0.0☆34Updated 2 years ago
- This project describes how to write full ETL data pipeline using spark.☆15Updated 2 years ago
- Helping you get Airflow running in production.☆9Updated 5 years ago
- Asynchronous actions for PySpark☆47Updated 3 years ago
- Examples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops☆118Updated last year
- Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computin…☆24Updated last year
- Airflow basics tutorial☆397Updated 3 years ago
- Deploy your Spark Production Cluster on Kubernetes☆47Updated 4 years ago
- ☆31Updated 5 years ago
- Airflow training for the crunch conf☆105Updated 6 years ago
- Multiple node presto cluster on docker container☆124Updated 2 years ago
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆97Updated 2 years ago
- ☆198Updated last year
- Infrastructure automation to deploy Hadoop,Hive,Spark,airflow nodes on a docker host☆20Updated 6 years ago
- A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator☆73Updated 5 years ago
- ☆37Updated 5 years ago
- ☆23Updated 4 years ago