jware-solutions / docker-big-data-cluster
A ready to go Big Data cluster (Hadoop + Hadoop Streaming + Spark + PySpark) with Docker and Docker Swarm!
☆19Updated 2 weeks ago
Alternatives and similar repositories for docker-big-data-cluster:
Users that are interested in docker-big-data-cluster are comparing it to the libraries listed below
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- Base Docker image with just essentials: Hadoop, Hive and Spark.☆68Updated 3 years ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆114Updated this week
- A simple Spark-powered ETL framework that just works 🍺☆178Updated last year
- Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data☆41Updated last year
- spark on kubernetes☆105Updated last year
- Docker multi-nodes Hadoop cluster with Spark 2.4.1 on Yarn☆50Updated 4 years ago
- Example to create lineage in Atlas with sqoop and spark☆14Updated 7 years ago
- Hadoop, Hive, Spark, Zeppelin and Livy: all in one Docker-compose file.☆161Updated 3 years ago
- Docker with Airflow and Spark standalone cluster☆247Updated last year
- ☆25Updated last year
- Guide for databricks spark certification☆58Updated 3 years ago
- A Spark cluster setup running on Docker containers☆60Updated 5 years ago
- A general purpose framework for automating Cloudera Products☆66Updated last month
- Quick Guides from Dremio on Several topics☆67Updated this week
- Tutorial for setting up a Spark cluster running inside of Docker containers located on different machines☆126Updated 2 years ago
- Databricks - Apache Spark™ - 2X Certified Developer☆265Updated 4 years ago
- ☆91Updated 2 years ago
- The official repository for the Rock the JVM Spark Optimization with Scala course☆57Updated last year
- The Internals of Spark on Kubernetes☆70Updated 2 years ago
- Spark structured streaming examples with using of version 3.5.1☆26Updated 8 months ago
- Zeppelin docker☆15Updated 4 years ago
- Infrastructure automation to deploy Hadoop,Hive,Spark,airflow nodes on a docker host☆20Updated 6 years ago
- ETL pipeline using pyspark (Spark - Python)☆112Updated 4 years ago
- Apache Spark 3 - Structured Streaming Course Material☆44Updated 4 years ago
- This Apache Atlas is built from the latest release source tarball and patched to be run in a Docker container.☆139Updated last year
- ☆32Updated 6 years ago
- Delta-Lake, ETL, Spark, Airflow☆45Updated 2 years ago
- Multi-container environment with Hadoop, Spark and Hive☆204Updated last year
- The source code for the book Modern Data Engineering with Apache Spark☆34Updated 2 years ago