heirinsinho / all-in-one-docker-bigdataopsLinks
all-in-one-docker-bigdataops is a comprehensive Docker Compose environment that simplifies Big Data operations by bundling Hadoop, Spark, Hive, Hue, and Airflow into a ready-to-run stack, with example workflows, quick setup, and easy customization, making it ideal for learning, development, and testing in Big DataOps.
☆21Updated 9 months ago
Alternatives and similar repositories for all-in-one-docker-bigdataops
Users that are interested in all-in-one-docker-bigdataops are comparing it to the libraries listed below
Sorting:
- Hadoop-Hive-Spark cluster + Jupyter on Docker☆80Updated 10 months ago
- The goal of this project is to build a docker cluster that gives access to Hadoop, HDFS, Hive, PySpark, Sqoop, Airflow, Kafka, Flume, Pos…☆74Updated 2 years ago
- Repository for building docker image, with open-source applications☆26Updated last year
- Hadoop, Hive, Spark, Zeppelin and Livy: all in one Docker-compose file.☆169Updated 4 years ago
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆63Updated 2 years ago
- Vanna AI Streamlit App☆325Updated last year
- Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi☆118Updated last year
- Base Docker image with just essentials: Hadoop, Hive and Spark.☆69Updated 4 years ago
- Cloud-native Trino (prestosql) + Hive + Minio + Superset☆24Updated 4 years ago
- GitHub Pages documenting Open Data Mesh Platform☆14Updated 3 weeks ago
- To provide a deeper understanding of how the modern, open-source data stack consisting of Iceberg, dbt, Trino, and Hive operates within a…☆42Updated last year
- Delta Lake examples☆233Updated last year
- Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.☆500Updated 3 weeks ago
- Apache DolphinScheduler Python API, aka PyDolphinscheduler.☆62Updated 4 months ago
- Postgresql configured to work as metastore for Hive.☆32Updated 2 years ago
- ☆269Updated last year
- Airflow Examples: code samples for Medium articles☆14Updated 4 years ago
- 简单易用的ETL工具☆17Updated 6 years ago
- Docker with Airflow and Spark standalone cluster☆262Updated 2 years ago
- Tutorial for setting up a Spark cluster running inside of Docker containers located on different machines☆134Updated 3 years ago
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆102Updated 2 years ago
- Spark Streaming + kafka + hbase☆15Updated 7 years ago
- a dbt adapter for Apache Doris☆28Updated 2 years ago
- Building a Data Pipeline with an Open Source Stack☆55Updated 5 months ago
- O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian☆223Updated 2 years ago
- A data pipeline with Kafka, Spark Streaming, dbt, Docker, Airflow, and GCP!☆12Updated 2 years ago
- Apache Spark with HDFS cluster within Kubernetes☆11Updated 2 years ago
- This is my Apache Airflow Local development setup on Windows 10 WSL2/Mac using docker-compose. It will also include some sample DAGs and …☆34Updated last year
- 基于Spark SQL,可通过输入SQL语句操作HBase表,目前提供对HBase表的查询、创建、删除以及数据插入(需要自己指定rowKey生成规则)的功能,数据删除,分布式导入大规模数据相关功能正在开发中☆13Updated last year
- ☆48Updated 2 years ago