vincentnam / docker_datalakeLinks
Datalake
☆31Updated this week
Alternatives and similar repositories for docker_datalake
Users that are interested in docker_datalake are comparing it to the libraries listed below
Sorting:
- apache-nifi-templates☆54Updated 4 years ago
- Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data☆49Updated 2 years ago
- Micro Data Lake based on Docker Compose☆17Updated 5 years ago
- Zeppelin docker☆16Updated 5 years ago
- Build Data Lake using Open Source tools☆119Updated 8 months ago
- Main TDP repository☆58Updated last week
- Infraestructura para Big Data : Hadoop + NiFi +Spark + Hive usando Docker☆20Updated last month
- ☆91Updated 3 years ago
- EverythingApacheNiFi☆116Updated 2 years ago
- Youtube Apache NiFi 2022 Series resources☆90Updated 2 years ago
- Repository for Docker Image of Apache-Superset. [Docker Image: https://hub.docker.com/r/abhioncbr/docker-superset]☆105Updated 4 years ago
- Trino On K8S Via Helm & Metastore Workshop Querying Delta Tables☆12Updated last year
- Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.☆506Updated 3 months ago
- Template to spin up delta lake locally using docker☆23Updated 2 years ago
- Repository for building docker image, with open-source applications☆26Updated last year
- ☆46Updated 2 years ago
- Apache Airflow in Docker Compose (for both versions 1.10.* and 2.*)☆184Updated 2 years ago
- Multi-container environment with Hadoop, Spark and Hive☆232Updated 9 months ago
- Collection of NiFi-related stuff☆25Updated 3 years ago
- Tutorial for setting up a Spark cluster running inside of Docker containers located on different machines☆135Updated 3 years ago
- Full stack data engineering tools and infrastructure set-up☆57Updated 4 years ago
- Support for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....☆80Updated this week
- Apache NiFi cluster running in Kubernetes☆61Updated 3 weeks ago
- ☆27Updated last year
- Dockerizing an Apache Spark Standalone Cluster☆42Updated 3 years ago
- This project implements an ELT (Extract - Load - Transform) data pipeline with the goodreads dataset, using dagster (orchestration), spar…☆42Updated 2 years ago
- Data Engineering Projects using Mage.ai as orchestrator☆19Updated 2 weeks ago
- A Python package to submit and manage Apache Spark applications on Kubernetes.☆46Updated 6 months ago
- Data Engineering examples for Airflow, Prefect; dbt for BigQuery, Redshift, ClickHouse, Postgres, DuckDB; PySpark for Batch processing; K…☆69Updated last week
- The goal of this project is to build a docker cluster that gives access to Hadoop, HDFS, Hive, PySpark, Sqoop, Airflow, Kafka, Flume, Pos…☆76Updated 2 years ago