Wittline / docker-livy
Dockerizing and Consuming an Apache Livy environment
☆11Updated 2 years ago
Alternatives and similar repositories for docker-livy:
Users that are interested in docker-livy are comparing it to the libraries listed below
- ☆87Updated 2 years ago
- Apache Spark 3 - Structured Streaming Course Material☆121Updated last year
- Spark data pipeline that processes movie ratings data.☆28Updated this week
- Delta-Lake, ETL, Spark, Airflow☆46Updated 2 years ago
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- Simple ETL pipeline using Python☆25Updated last year
- Ravi Azure ADB ADF Repository☆65Updated 2 months ago
- Spark development environment for kubernetes, spark-submit and jupyter notebook☆19Updated 3 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆53Updated last year
- ETL pipeline using pyspark (Spark - Python)☆113Updated 4 years ago
- ☆11Updated 4 years ago
- End to end data engineering project☆53Updated 2 years ago
- ☆25Updated last year
- Solution to all projects of Udacity's Data Engineering Nanodegree: Data Modeling with Postgres & Cassandra, Data Warehouse with Redshift,…☆56Updated 2 years ago
- To provide a deeper understanding of how the modern, open-source data stack consisting of Iceberg, dbt, Trino, and Hive operates within a…☆32Updated last year
- Quick Guides from Dremio on Several topics☆70Updated 2 months ago
- Simple stream processing pipeline☆99Updated 9 months ago
- Data Engineering with Spark and Delta Lake☆96Updated 2 years ago
- This project helps me to understand the core concepts of Apache Airflow. I have created custom operators to perform tasks such as staging…☆82Updated 5 years ago
- Big Data Engineering practice project, including ETL with Airflow and Spark using AWS S3 and EMR☆82Updated 5 years ago
- Complete data engineering pipeline running on Minikube Kubernetes, Argo CD, Spark, Trino, S3, Delta lake, Postgres+ Debezium CDC, MySQL,…☆28Updated 2 months ago
- A repository of sample code to show data quality checking best practices using Airflow.☆75Updated 2 years ago
- Materials for the next course☆24Updated 2 years ago
- ☆14Updated 5 years ago
- This repository contains code for Spark Streaming☆21Updated 4 years ago
- RedditR for Content Engagement and Recommendation☆21Updated 7 years ago
- Realtime Data Engineering Project☆27Updated 2 months ago
- A repository of sample code to accompany our blog post on Airflow and dbt.☆171Updated last year
- Simple repo to demonstrate how to submit a spark job to EMR from Airflow☆33Updated 4 years ago
- In this article, you will learn how to set up a real-time data processing and analytics environment using Docker, MySQL, Redpanda, MinIO,…☆10Updated last year