Pathairush / airflow_hive_spark_sqoopLinks
A docker using the airflow with Hadoop ecosystem (hive, spark, and sqoop)
☆12Updated 4 years ago
Alternatives and similar repositories for airflow_hive_spark_sqoop
Users that are interested in airflow_hive_spark_sqoop are comparing it to the libraries listed below
Sorting:
- Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data☆47Updated last year
- Hadoop-Hive-Spark cluster + Jupyter on Docker☆78Updated 8 months ago
- Base Docker image with just essentials: Hadoop, Hive and Spark.☆70Updated 4 years ago
- Docker with Airflow and Spark standalone cluster☆261Updated 2 years ago
- Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.☆495Updated 2 years ago
- A simplified, lightweight ETL Framework based on Apache Spark☆589Updated last year
- Docker multi-nodes Hadoop cluster with Spark 2.4.1 on Yarn☆51Updated 4 years ago
- ☆267Updated 10 months ago
- Simple repo to demonstrate how to submit a spark job to EMR from Airflow☆34Updated 4 years ago
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 3 years ago
- Multi-container environment with Hadoop, Spark and Hive☆221Updated 4 months ago
- Hadoop, Hive, Spark, Zeppelin and Livy: all in one Docker-compose file.☆167Updated 4 years ago
- This project provides Apache Spark SQL, RDD, DataFrame and Dataset examples in Scala language☆567Updated last year
- Spark and Hive docker containers sharing a common MySQL metastore☆26Updated 5 years ago
- ETL pipeline using pyspark (Spark - Python)☆116Updated 5 years ago
- Multi-stage, config driven, SQL based ETL framework using PySpark☆26Updated 5 years ago
- Delta Lake examples☆227Updated 10 months ago
- Spark Examples☆125Updated 3 years ago
- Databricks - Apache Spark™ - 2X Certified Developer☆265Updated 5 years ago
- Life-cycle: Internal working of HDFS, SQOOP, HIVE, SPARK, HBASE, KAFKA with code.☆15Updated 5 years ago
- A repository of sample code to accompany our blog post on Airflow and dbt.☆175Updated 2 years ago
- ☆14Updated 2 years ago
- PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster☆476Updated 10 months ago
- Spark on Kubernetes using Helm☆34Updated 5 years ago
- Examples of Spark 3.0☆46Updated 4 years ago
- The Internals of Spark SQL☆473Updated this week
- Apache Spark Course Material☆94Updated 2 years ago
- Spark style guide☆262Updated 11 months ago
- This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python…☆44Updated last year
- Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi☆117Updated last year