Wittline / docker-livy
Dockerizing and Consuming an Apache Livy environment
☆11Updated 2 years ago
Alternatives and similar repositories for docker-livy:
Users that are interested in docker-livy are comparing it to the libraries listed below
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆39Updated 3 years ago
- End to end data engineering project☆53Updated 2 years ago
- Docker with Airflow and Spark standalone cluster☆247Updated last year
- Materials for the next course☆24Updated last year
- PySpark Cheatsheet☆36Updated 2 years ago
- ☆87Updated 2 years ago
- Simple repo to demonstrate how to submit a spark job to EMR from Airflow☆32Updated 4 years ago
- Complete data engineering pipeline running on Minikube Kubernetes, Argo CD, Spark, Trino, S3, Delta lake, Postgres+ Debezium CDC, MySQL,…☆29Updated last week
- Spark all the ETL Pipelines☆32Updated last year
- ETL pipeline using pyspark (Spark - Python)☆112Updated 4 years ago
- Delta-Lake, ETL, Spark, Airflow☆46Updated 2 years ago
- Spark development environment for kubernetes, spark-submit and jupyter notebook☆19Updated 3 years ago
- ☆25Updated last year
- Delta Lake examples☆214Updated 3 months ago
- Delta Lake Documentation☆48Updated 7 months ago
- ☆23Updated 4 years ago
- how to unit test your PySpark code☆28Updated 3 years ago
- Quick Guides from Dremio on Several topics☆67Updated last week
- Covid19 and Iowa Liquor Sales analysis at BigQuery using dbt, Airflow, Marquez, Google Cloud and other modern data stack tools☆14Updated 2 years ago
- Simple stream processing pipeline☆96Updated 7 months ago
- ☆14Updated 5 years ago
- Challenge Data Engineer☆25Updated 2 years ago
- Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow☆135Updated 4 years ago
- ☆9Updated last month
- This repo contains commands that data engineers use in day to day work.☆60Updated last year
- Data Engineering with Spark and Delta Lake☆94Updated 2 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆53Updated last year
- 😈Complete End to End ETL Pipeline with Spark, Airflow, & AWS☆43Updated 5 years ago
- Spark data pipeline that processes movie ratings data.☆27Updated this week