Pathairush / airflow_hive_spark_sqoopLinks
A docker using the airflow with Hadoop ecosystem (hive, spark, and sqoop)
☆12Updated 4 years ago
Alternatives and similar repositories for airflow_hive_spark_sqoop
Users that are interested in airflow_hive_spark_sqoop are comparing it to the libraries listed below
Sorting:
- Hadoop-Hive-Spark cluster + Jupyter on Docker☆76Updated 7 months ago
- Docker with Airflow and Spark standalone cluster☆261Updated 2 years ago
- A simplified, lightweight ETL Framework based on Apache Spark☆589Updated last year
- Delta Lake examples☆227Updated 10 months ago
- Base Docker image with just essentials: Hadoop, Hive and Spark.☆70Updated 4 years ago
- Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.☆496Updated 2 years ago
- A repository of sample code to accompany our blog post on Airflow and dbt.☆174Updated last year
- Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data☆47Updated last year
- Spark style guide☆260Updated 10 months ago
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 3 years ago
- This project provides Apache Spark SQL, RDD, DataFrame and Dataset examples in Scala language☆566Updated last year
- Spark and Hive docker containers sharing a common MySQL metastore☆26Updated 5 years ago
- ☆267Updated 9 months ago
- dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks☆440Updated 3 weeks ago
- Docker multi-nodes Hadoop cluster with Spark 2.4.1 on Yarn☆51Updated 4 years ago
- ETL pipeline using pyspark (Spark - Python)☆117Updated 5 years ago
- Delta Lake helper methods in PySpark☆325Updated 11 months ago
- Simple repo to demonstrate how to submit a spark job to EMR from Airflow☆34Updated 4 years ago
- A Python Library to support running data quality rules while the spark job is running⚡☆189Updated this week
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL database☆75Updated 3 years ago
- dbt Cloud pipelines in airflow examples☆36Updated last year
- Spark Examples☆125Updated 3 years ago
- PySpark test helper methods with beautiful error messages☆709Updated last week
- A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.☆204Updated 2 weeks ago
- ☆14Updated 2 years ago
- The Internals of Spark SQL☆472Updated last week
- The Internals of Delta Lake☆184Updated 7 months ago
- Life-cycle: Internal working of HDFS, SQOOP, HIVE, SPARK, HBASE, KAFKA with code.☆15Updated 5 years ago
- Apache Airflow integration for dbt☆410Updated last year
- O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian☆219Updated 2 years ago