jware-solutions / docker-big-data-cluster
A ready to go Big Data cluster (Hadoop + Hadoop Streaming + Spark + PySpark) with Docker and Docker Swarm!
☆18Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for docker-big-data-cluster
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- Zeppelin docker☆15Updated 4 years ago
- ☆90Updated 2 years ago
- Base Docker image with just essentials: Hadoop, Hive and Spark.☆67Updated 3 years ago
- ☆25Updated last year
- ETL pipeline using pyspark (Spark - Python)☆108Updated 4 years ago
- Multi-container environment with Hadoop, Spark and Hive☆203Updated 10 months ago
- spark on kubernetes☆105Updated last year
- ☆27Updated 9 months ago
- Guide for databricks spark certification☆58Updated 3 years ago
- Tutorial for setting up a Spark cluster running inside of Docker containers located on different machines☆123Updated 2 years ago
- Docker multi-nodes Hadoop cluster with Spark 2.4.1 on Yarn☆50Updated 3 years ago
- Apache Spark 3 - Structured Streaming Course Material☆119Updated last year
- Infrastructure automation to deploy Hadoop,Hive,Spark,airflow nodes on a docker host☆20Updated 5 years ago
- A general purpose framework for automating Cloudera Products☆64Updated this week
- How to manage Slowly Changing Dimensions with Apache Hive☆55Updated 5 years ago
- Delta-Lake, ETL, Spark, Airflow☆44Updated 2 years ago
- Apche Spark Structured Streaming with Kafka using Python(PySpark)☆41Updated 5 years ago
- Edge2AI Workshop☆68Updated 3 weeks ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆53Updated last year
- Spark and Delta Lake Workshop☆22Updated 2 years ago
- Examples of Spark 3.0☆47Updated 4 years ago
- ☆111Updated 4 years ago
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆111Updated this week
- Hadoop, Hive, Spark, Zeppelin and Livy: all in one Docker-compose file.☆161Updated 3 years ago
- The source code for the book Modern Data Engineering with Apache Spark☆33Updated 2 years ago
- Educational notes,Hands on problems w/ solutions for hadoop ecosystem☆86Updated 5 years ago
- Docker with Airflow and Spark standalone cluster☆246Updated last year
- The demo of using Kafka, Spark, Hive, Cassandra, etc by using Docker. It produces the production ready environment for any kinds of big d…☆31Updated 5 years ago