Pathairush / airflow_hive_spark_sqoop
A docker using the airflow with Hadoop ecosystem (hive, spark, and sqoop)
☆11Updated 3 years ago
Alternatives and similar repositories for airflow_hive_spark_sqoop:
Users that are interested in airflow_hive_spark_sqoop are comparing it to the libraries listed below
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆53Updated last year
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- Simple repo to demonstrate how to submit a spark job to EMR from Airflow☆32Updated 4 years ago
- Docker with Airflow and Spark standalone cluster☆247Updated last year
- Delta Lake examples☆214Updated 3 months ago
- Infrastructure automation to deploy Hadoop,Hive,Spark,airflow nodes on a docker host☆20Updated 6 years ago
- ☆116Updated 3 months ago
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆97Updated last year
- Apache Spark 3 - Structured Streaming Course Material☆121Updated last year
- Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data☆41Updated last year
- dbt Cloud pipelines in airflow examples☆35Updated last year
- Spark on Kubernetes using Helm☆34Updated 4 years ago
- Base Docker image with just essentials: Hadoop, Hive and Spark.☆68Updated 3 years ago
- A repository of sample code to accompany our blog post on Airflow and dbt.☆168Updated last year
- Spark and Hive docker containers sharing a common MySQL metastore☆26Updated 4 years ago
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆58Updated last year
- Presto Trino with Apache Hive Postgres metastore☆38Updated 4 months ago
- Materials of the Official Helm Chart Webinar☆27Updated 3 years ago
- Hadoop-Hive-Spark cluster + Jupyter on Docker☆63Updated 2 weeks ago
- 😈Complete End to End ETL Pipeline with Spark, Airflow, & AWS☆43Updated 5 years ago
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆49Updated last year
- Code for dbt tutorial☆149Updated 7 months ago
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆118Updated last month
- ☆9Updated 3 weeks ago
- ☆26Updated 4 years ago
- Execution of DBT models using Apache Airflow through Docker Compose☆113Updated 2 years ago
- Simple stream processing pipeline☆94Updated 7 months ago
- A sample implementation of stream writes to an Iceberg table on GCS using Flink and reading it using Trino☆19Updated 2 years ago
- ☆256Updated 2 months ago
- Example of how to leverage Apache Spark distributed capabilities to call REST-API using a UDF☆50Updated 2 years ago