dimajix / docker-jupyter-spark
Docker image for Jupyter notebooks with PySpark
☆27Updated 6 years ago
Alternatives and similar repositories for docker-jupyter-spark:
Users that are interested in docker-jupyter-spark are comparing it to the libraries listed below
- Airflow training for the crunch conf☆104Updated 6 years ago
- An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR☆174Updated last year
- One click deploy docker-compose with Kafka, Spark Streaming, Zeppelin UI and Monitoring (Grafana + Kafka Manager)☆119Updated 3 years ago
- spark on kubernetes☆105Updated last year
- Repository used for Spark Trainings☆53Updated last year
- Docker multi-nodes Hadoop cluster with Spark 2.4.1 on Yarn☆50Updated 4 years ago
- Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks☆35Updated 4 years ago
- How to build an awesome data engineering team☆99Updated 5 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆53Updated last year
- Real-world Spark pipelines examples☆83Updated 6 years ago
- A repository of sample code to show data quality checking best practices using Airflow.☆74Updated last year
- Spark style guide☆257Updated 4 months ago
- A boilerplate for writing PySpark Jobs☆396Updated last year
- Delta-Lake, ETL, Spark, Airflow☆46Updated 2 years ago
- How to manage Slowly Changing Dimensions with Apache Hive☆55Updated 5 years ago
- Apache Spark Course Material☆86Updated last year
- The official repository for the Rock the JVM Spark Optimization 2 course☆38Updated last year
- ☆197Updated last year
- Spark Examples☆125Updated 3 years ago
- Sample Airflow DAGs☆62Updated 2 years ago
- Apache Spark 3 - Structured Streaming Course Material☆121Updated last year
- Data Engineering with Spark and Delta Lake☆94Updated 2 years ago
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆97Updated 2 years ago
- Pyspark boilerplate for running prod ready data pipeline☆28Updated 3 years ago
- 🐍💨 Airflow tutorial for PyCon 2019☆85Updated 2 years ago
- PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2☆83Updated 5 years ago
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆39Updated 3 years ago
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- ETL pipeline using pyspark (Spark - Python)☆112Updated 4 years ago
- Apache (Py)Spark type annotations (stub files).☆115Updated 2 years ago