edbullen / DockerSpark245
Spark cluster in docker containers with sample training Jupyter notebooks
☆25Updated last year
Related projects ⓘ
Alternatives and complementary repositories for DockerSpark245
- Docker with Airflow and Spark standalone cluster☆246Updated last year
- End-to-end Kafka Streaming Examples on Databricks with Evolving Avro Schemas.☆9Updated 8 months ago
- ☆32Updated last year
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆39Updated 3 years ago
- Tutorial for setting up a Spark cluster running inside of Docker containers located on different machines☆123Updated 2 years ago
- Delta-Lake, ETL, Spark, Airflow☆44Updated 2 years ago
- The goal of this project is to build a docker cluster that gives access to Hadoop, HDFS, Hive, PySpark, Sqoop, Airflow, Kafka, Flume, Pos…☆55Updated last year
- ☆43Updated last year
- A real-time streaming ETL pipeline for streaming and performing sentiment analysis on Twitter data using Apache Kafka, Apache Spark and D…☆29Updated 4 years ago
- Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.☆463Updated last year
- Produce Kafka messages, consume them and upload into Cassandra, MongoDB.☆37Updated last year
- Multi-container environment with Hadoop, Spark and Hive☆203Updated 10 months ago
- Docker with Airflow + Postgres + Spark cluster + JDK (spark-submit support) + Jupyter Notebooks☆21Updated 2 years ago
- In this article, you will learn how to set up a real-time data processing and analytics environment using Docker, MySQL, Redpanda, MinIO,…☆10Updated last year
- Delta Lake examples☆208Updated last month
- Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO☆57Updated last year
- Simple repo to demonstrate how to submit a spark job to EMR from Airflow☆32Updated 4 years ago
- Examples surrounding Databricks.☆56Updated 4 months ago
- ETL pipeline using pyspark (Spark - Python)☆108Updated 4 years ago
- Guide for databricks spark certification☆58Updated 3 years ago
- ☆86Updated 2 years ago
- Apache Spark 3 - Structured Streaming Course Material☆119Updated last year
- ☆27Updated last year
- Infraestructura para Big Data : Hadoop + NiFi +Spark + Hive usando Docker☆19Updated last year
- This repo contains a spark standalone cluster on docker for anyone who wants to play with PySpark by submitting their applications.☆23Updated last year
- ☆90Updated 2 years ago
- Productionalizing Data Pipelines with Apache Airflow☆111Updated 2 years ago
- Unit testing using databricks connect☆30Updated 3 years ago
- The resources of the preparation course for Databricks Data Engineer Professional certification exam☆86Updated last month