edbullen / DockerSpark245
Spark cluster in docker containers with sample training Jupyter notebooks
☆27Updated last year
Alternatives and similar repositories for DockerSpark245:
Users that are interested in DockerSpark245 are comparing it to the libraries listed below
- Delta-Lake, ETL, Spark, Airflow☆46Updated 2 years ago
- Docker with Airflow and Spark standalone cluster☆249Updated last year
- ☆35Updated 2 years ago
- Docker with Airflow + Postgres + Spark cluster + JDK (spark-submit support) + Jupyter Notebooks☆22Updated 2 years ago
- This project is for demonstrating knowledge of Data Engineering tools and concepts and also learning in the process☆46Updated 2 years ago
- A data pipeline with Kafka, Spark Streaming, dbt, Docker, Airflow, and GCP!☆11Updated last year
- ☆12Updated 3 years ago
- ☆87Updated 2 years ago
- ☆41Updated 7 months ago
- Guide for databricks spark certification☆58Updated 3 years ago
- ☆46Updated last year
- The resources of the preparation course for Databricks Data Engineer Professional certification exam☆102Updated this week
- ☆79Updated 2 weeks ago
- End-to-end Kafka Streaming Examples on Databricks with Evolving Avro Schemas.☆9Updated 11 months ago
- Code snippets for Data Engineering Design Patterns book☆69Updated 2 weeks ago
- The goal of this project is to build a docker cluster that gives access to Hadoop, HDFS, Hive, PySpark, Sqoop, Airflow, Kafka, Flume, Pos…☆58Updated last year
- Resources for video demonstrations and blog posts related to DataOps on AWS☆172Updated 3 years ago
- Apache Spark 3 - Structured Streaming Course Material☆121Updated last year
- ☆10Updated this week
- A Micosoft Power BI Custom Connector allowing you to import Trino data into Power BI.☆62Updated last month
- This repo contains a spark standalone cluster on docker for anyone who wants to play with PySpark by submitting their applications.☆30Updated last year
- used Airflow, Postgres, Kafka, Spark, and Cassandra, and GitHub Actions to establish an end-to-end data pipeline☆27Updated last year
- Series follows learning from Apache Spark (PySpark) with quick tips and workaround for daily problems in hand☆47Updated last year
- ☆15Updated last year
- 📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.☆36Updated last month
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆39Updated 3 years ago
- Near real time ETL to populate a dashboard.☆73Updated 8 months ago
- Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.☆478Updated 2 years ago
- A Docker Compose template that builds a interactive development environment for PySpark with Jupyter Lab, MinIO as object storage, Hive M…☆40Updated 2 months ago
- Delta Lake examples☆217Updated 4 months ago