brunocfnba / docker-spark-clusterLinks
Set up a 3 node spark cluster using docker containers
☆34Updated 7 years ago
Alternatives and similar repositories for docker-spark-cluster
Users that are interested in docker-spark-cluster are comparing it to the libraries listed below
Sorting:
- An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR☆174Updated last week
- spark on kubernetes☆104Updated 2 years ago
- Data validation library for PySpark 3.0.0☆33Updated 2 years ago
- How to manage Slowly Changing Dimensions with Apache Hive☆55Updated 5 years ago
- Apache Spark docker container image (Standalone mode)☆35Updated 4 years ago
- Quickstart PySpark with Anaconda on AWS/EMR using Terraform☆47Updated 5 months ago
- Airflow training for the crunch conf☆105Updated 6 years ago
- Docker container for Kafka - Spark Streaming - Cassandra☆98Updated 5 years ago
- ☆27Updated last year
- Use Airflow to move data from multiple MySQL databases to BigQuery☆100Updated 4 years ago
- pyspark-cassandra is a Python port of the awesome @datastax Spark Cassandra connector. Compatible w/ Spark 2.0, 2.1, 2.2, 2.3 and 2.4☆69Updated 7 months ago
- Examples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops☆117Updated 2 years ago
- ☆72Updated 4 years ago
- ☆199Updated last year
- Demonstrates calling a Scala UDF from Python using spark-submit with an EGG and JAR☆21Updated 5 years ago
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆98Updated 2 years ago
- Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.☆38Updated 2 years ago
- Spark and Hive docker containers sharing a common MySQL metastore☆26Updated 5 years ago
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- Ambari stack service for installing and managing Apache Airflow on HDP cluster☆59Updated 6 years ago
- Cloud-native Trino (prestosql) + Hive + Minio + Superset☆23Updated 3 years ago
- Spark style guide☆259Updated 8 months ago
- ☆23Updated 4 years ago
- ☆37Updated 6 years ago
- Examples for High Performance Spark☆15Updated 7 months ago
- Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computin…☆24Updated last year
- Spark app to merge different schemas☆23Updated 4 years ago
- Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra☆85Updated 8 years ago
- Docker image to submit Spark applications☆38Updated 7 years ago
- Real-world Spark pipelines examples☆83Updated 7 years ago