jware-solutions / docker-big-data-cluster
A ready to go Big Data cluster (Hadoop + Hadoop Streaming + Spark + PySpark) with Docker and Docker Swarm!
☆19Updated 3 months ago
Alternatives and similar repositories for docker-big-data-cluster:
Users that are interested in docker-big-data-cluster are comparing it to the libraries listed below
- One click deploy docker-compose with Kafka, Spark Streaming, Zeppelin UI and Monitoring (Grafana + Kafka Manager)☆120Updated 3 years ago
- Tutorial for setting up a Spark cluster running inside of Docker containers located on different machines☆130Updated 2 years ago
- Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data☆45Updated last year
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- Docker multi-nodes Hadoop cluster with Spark 2.4.1 on Yarn☆51Updated 4 years ago
- Datagenerator for Data Services☆16Updated 4 months ago
- Ambari stack service for installing and managing Apache Airflow on HDP cluster☆59Updated 6 years ago
- Docker with Airflow and Spark standalone cluster☆255Updated last year
- Base Docker image with just essentials: Hadoop, Hive and Spark.☆68Updated 4 years ago
- Infrastructure automation to deploy Hadoop,Hive,Spark,airflow nodes on a docker host☆20Updated 6 years ago
- Hadoop, Hive, Spark, Zeppelin and Livy: all in one Docker-compose file.☆164Updated 4 years ago
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆67Updated last year
- Delta-Lake, ETL, Spark, Airflow☆47Updated 2 years ago
- ☆264Updated 6 months ago
- ☆20Updated last year
- Zeppelin docker☆15Updated 4 years ago
- DBImport ingestion tool. Handle import, export and standard ETL flows in Hadoop/Hive☆18Updated 2 months ago
- ☆27Updated last year
- The source code for the book Modern Data Engineering with Apache Spark☆36Updated 2 years ago
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆38Updated 4 years ago
- Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.☆488Updated 2 years ago
- ☆92Updated 2 years ago
- ☆25Updated 4 years ago
- Atlas custom type definitions☆16Updated 3 years ago
- Pentaho plugin for Apache Airflow - Orquestate pentaho transformations and jobs from Airflow☆39Updated 9 months ago
- Postgresql configured to work as metastore for Hive.☆32Updated 2 years ago
- Delta Lake examples☆224Updated 6 months ago
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆98Updated 2 years ago
- Guide for databricks spark certification☆58Updated 3 years ago
- ☆23Updated 4 years ago