pavank / docker-bigdata-clusterLinks

Infrastructure automation to deploy Hadoop,Hive,Spark,airflow nodes on a docker host

☆20

Alternatives and similar repositories for docker-bigdata-cluster

Users that are interested in docker-bigdata-cluster are comparing it to the libraries listed below

Sorting:

myamafuj / hadoop-hive-spark-docker
Hadoop-Hive-Spark cluster + Jupyter on Docker
☆75Updated 6 months ago
panovvv / hadoop-hive-spark-docker
Base Docker image with just essentials: Hadoop, Hive and Spark.
☆70Updated 4 years ago
Pathairush / airflow_hive_spark_sqoop
A docker using the airflow with Hadoop ecosystem (hive, spark, and sqoop)
☆12Updated 4 years ago
mapr-demos / SparkStreamingHBaseExample
Spark Streaming HBase Example
☆22Updated 9 years ago
curtishoward / spark-stream-kudu
Kafka, Spark Streaming, Kudu integration examples
☆17Updated 7 years ago
jorgeacf / dockerfiles
Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, …
☆36Updated 7 months ago
Wittline / apache-spark-docker
Dockerizing an Apache Spark Standalone Cluster
☆43Updated 3 years ago
tj--- / iceberg-demo
A sample implementation of stream writes to an Iceberg table on GCS using Flink and reading it using Trino
☆21Updated 3 years ago
big-data-europe / docker-hive-metastore-postgresql
Postgresql configured to work as metastore for Hive.
☆32Updated 2 years ago
Aleksandr-Filichkin / flink-k8s
Flink image for Kubernetes that fixes Jobmanage connection issue
☆26Updated 6 years ago
japila-books / pyspark-internals
The Internals of PySpark
☆26Updated 6 months ago
phatak-dev / spark-3.0-examples
Examples of Spark 3.0
☆47Updated 4 years ago
bcgov / nifi-atlas
A bridge to Apache Atlas for provenance metadata created in course of using Apache NiFi
☆15Updated 2 years ago
izhangzhihao / Real-time-Data-Warehouse
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
☆115Updated last year
LearningJournal / SparkProgrammingInScala
Apache Spark Course Material
☆95Updated 2 years ago
indiacloudtv / structuredstreamingkafkapyspark
Apche Spark Structured Streaming with Kafka using Python(PySpark)
☆40Updated 6 years ago
arempter / hive-metastore-docker
Example for article Running Spark 3 with standalone Hive Metastore 3.0
☆99Updated 2 years ago
EthicalML / kafka-spark-streaming-zeppelin-docker
One click deploy docker-compose with Kafka, Spark Streaming, Zeppelin UI and Monitoring (Grafana + Kafka Manager)
☆120Updated 3 years ago
vivek-bombatkar / Spark-with-Python---My-learning-notes-
ETL pipeline using pyspark (Spark - Python)
☆117Updated 5 years ago
agile-lab-dev / DataQuality
DataQuality for BigData
☆144Updated last year
ysfesr / Building-Data-LakeHouse
Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data
☆47Updated last year
mahmoudparsian / big-data-mapreduce-course
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
☆158Updated 7 months ago
enkhalifapro / bigdata-all-in-one
Docker-compose contains the most common big data systems like: Apache Hadoop, Apache Hive, Apache Spark, Jupyter, Flink
☆27Updated last year
XavientInformationSystems / Data-Ingestion-Platform
☆49Updated 5 years ago
moyano83 / High-Performance-Spark
☆31Updated 5 years ago
mahmoudparsian / data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
☆216Updated 2 years ago
jgperrin / net.jgp.labs.spark
Apache Spark examples exclusively in Java
☆102Updated 2 years ago
techmonad / spark-data-pipeline
This project describes how to write full ETL data pipeline using spark.
☆15Updated 2 years ago
rootsongjc / spark-on-kubernetes
Image building contents for running Spark standalone on Kubernetes
☆16Updated 5 years ago
ven2day / Bigdata-docker-sandbox
Docker Big Data Tools: This docker-compose file is configured to run multiple nodes. This is a Hadoop Cluster that contains the necessary…
☆30Updated 4 years ago