Pathairush / airflow_hive_spark_sqoopLinks

A docker using the airflow with Hadoop ecosystem (hive, spark, and sqoop)

☆12

Alternatives and similar repositories for airflow_hive_spark_sqoop

Users that are interested in airflow_hive_spark_sqoop are comparing it to the libraries listed below

Sorting:

cordon-thiago / airflow-spark
Docker with Airflow and Spark standalone cluster
☆261Updated 2 years ago
cluster-apps-on-docker / spark-standalone-cluster-on-docker
Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.
☆494Updated 2 years ago
panovvv / hadoop-hive-spark-docker
Base Docker image with just essentials: Hadoop, Hive and Spark.
☆70Updated 4 years ago
myamafuj / hadoop-hive-spark-docker
Hadoop-Hive-Spark cluster + Jupyter on Docker
☆80Updated 9 months ago
josephmachado / spark_submit_airflow
Simple repo to demonstrate how to submit a spark job to EMR from Airflow
☆34Updated 5 years ago
ysfesr / Building-Data-LakeHouse
Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data
☆48Updated last year
astronomer / airflow-dbt-demo
A repository of sample code to accompany our blog post on Airflow and dbt.
☆179Updated 2 years ago
sdesilva26 / docker-spark
Tutorial for setting up a Spark cluster running inside of Docker containers located on different machines
☆134Updated 2 years ago
dominikhei / Local-Data-LakeHouse
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…
☆74Updated 2 years ago
bitsondatadev / trino-getting-started
☆269Updated last year
spark-examples / spark-scala-examples
This project provides Apache Spark SQL, RDD, DataFrame and Dataset examples in Scala language
☆567Updated last year
1ambda / lakehouse
Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)
☆61Updated 2 years ago
YotpoLtd / metorikku
A simplified, lightweight ETL Framework based on Apache Spark
☆588Updated last year
Wittline / apache-spark-docker
Dockerizing an Apache Spark Standalone Cluster
☆43Updated 3 years ago
Marcel-Jan / docker-hadoop-spark
Multi-container environment with Hadoop, Spark and Hive
☆225Updated 5 months ago
delta-io / delta-examples
Delta Lake examples
☆230Updated last year
cartershanklin / pyspark-cheatsheet
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
☆479Updated last year
zsvoboda / ngods-stocks
New Generation Opensource Data Stack Demo
☆449Updated 2 years ago
dbt-labs / dbt-spark
This repository has moved into https://github.com/dbt-labs/dbt-adapters
☆442Updated 3 months ago
mahdyne / pyspark-tut
☆23Updated 4 years ago
josephmachado / beginner_de_project_stream
Simple stream processing pipeline
☆110Updated last year
mahmoudparsian / data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
☆222Updated 2 years ago
amesar / docker-spark-hive-metastore
Spark and Hive docker containers sharing a common MySQL metastore
☆26Updated 5 years ago
arempter / hive-metastore-docker
Example for article Running Spark 3 with standalone Hive Metastore 3.0
☆102Updated 2 years ago
mvillarrealb / docker-spark-cluster
A simple spark standalone cluster for your testing environment purposses
☆570Updated last year
vim89 / datapipelines-essentials-python
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…
☆55Updated 2 years ago
vivek-bombatkar / Databricks-Apache-Spark-2X-Certified-Developer
Databricks - Apache Spark™ - 2X Certified Developer
☆265Updated 5 years ago
vivek-bombatkar / Spark-with-Python---My-learning-notes-
ETL pipeline using pyspark (Spark - Python)
☆116Updated 5 years ago
Jayvardhan-Reddy / BigData-Ecosystem-Architecture
Life-cycle: Internal working of HDFS, SQOOP, HIVE, SPARK, HBASE, KAFKA with code.
☆15Updated 6 years ago
GoogleCloudDataproc / spark-bigquery-connector
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
☆409Updated last week