jware-solutions / docker-big-data-clusterLinks
A ready to go Big Data cluster (Hadoop + Hadoop Streaming + Spark + PySpark) with Docker and Docker Swarm!
☆20Updated 2 weeks ago
Alternatives and similar repositories for docker-big-data-cluster
Users that are interested in docker-big-data-cluster are comparing it to the libraries listed below
Sorting:
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- Tutorial for setting up a Spark cluster running inside of Docker containers located on different machines☆130Updated 2 years ago
- Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data☆46Updated last year
- Base Docker image with just essentials: Hadoop, Hive and Spark.☆69Updated 4 years ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆124Updated last week
- Official Dockerfile for Apache Spark☆137Updated last week
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆125Updated 2 weeks ago
- Cloud-native Trino (prestosql) + Hive + Minio + Superset☆23Updated 3 years ago
- A general purpose framework for automating Cloudera Products☆66Updated 3 months ago
- Multi-container environment with Hadoop, Spark and Hive☆217Updated last month
- Docker multi-nodes Hadoop cluster with Spark 2.4.1 on Yarn☆51Updated 4 years ago
- Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.☆489Updated 2 years ago
- ☆27Updated last year
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆38Updated 4 years ago
- spark on kubernetes☆104Updated 2 years ago
- Hadoop-Hive-Spark cluster + Jupyter on Docker☆76Updated 5 months ago
- A simple Spark-powered ETL framework that just works 🍺☆181Updated last month
- Edge2AI Workshop☆69Updated 3 weeks ago
- Docker with Airflow and Spark standalone cluster☆256Updated last year
- Spark style guide☆259Updated 8 months ago
- CSD for Apache Airflow☆20Updated 5 years ago
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆94Updated last week
- Delta Lake examples☆225Updated 7 months ago
- ☆30Updated 3 weeks ago
- Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi☆113Updated last year
- Datagenerator for Data Services☆16Updated 5 months ago
- ☆23Updated 2 years ago
- One Click Script to Deploy CDP (CDP PvC & HDP & CDH)☆31Updated 3 weeks ago
- The source code for the book Modern Data Engineering with Apache Spark☆36Updated 2 years ago
- ETL pipeline using pyspark (Spark - Python)☆116Updated 5 years ago