Wittline / docker-livyLinks
Dockerizing and Consuming an Apache Livy environment
☆12Updated 2 years ago
Alternatives and similar repositories for docker-livy
Users that are interested in docker-livy are comparing it to the libraries listed below
Sorting:
- ☆26Updated last year
- Docker with Airflow and Spark standalone cluster☆256Updated last year
- A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for …☆136Updated 5 years ago
- Apache Spark 3 - Structured Streaming Course Material☆121Updated last year
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆55Updated 2 years ago
- ☆87Updated 2 years ago
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- Course notes for the Astronomer Certification DAG Authoring for Apache Airflow☆53Updated last year
- End to end data engineering project☆56Updated 2 years ago
- Big Data Engineering practice project, including ETL with Airflow and Spark using AWS S3 and EMR☆84Updated 5 years ago
- Ravi Azure ADB ADF Repository☆66Updated 4 months ago
- Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow☆146Updated 4 years ago
- 😈Complete End to End ETL Pipeline with Spark, Airflow, & AWS☆46Updated 5 years ago
- ETL pipeline using pyspark (Spark - Python)☆116Updated 5 years ago
- ☆12Updated 4 years ago
- Delta-Lake, ETL, Spark, Airflow☆47Updated 2 years ago
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆71Updated last year
- An end-to-end data engineering pipeline to create a dashboard for the latest content on the r/Stocks subreddit☆19Updated 2 years ago
- Simple repo to demonstrate how to submit a spark job to EMR from Airflow☆33Updated 4 years ago
- Spark data pipeline that processes movie ratings data.☆28Updated this week
- Simple stream processing pipeline☆103Updated 11 months ago
- Near real time ETL to populate a dashboard.☆72Updated 11 months ago
- Code for dbt tutorial☆157Updated last year
- PySpark Cheatsheet☆36Updated 2 years ago
- RedditR for Content Engagement and Recommendation☆21Updated 7 years ago
- This repo contains commands that data engineers use in day to day work.☆61Updated 2 years ago
- Challenge Data Engineer☆25Updated 2 years ago
- A data engineering project with Airflow, dbt, Terrafrom, GCP and much more!☆25Updated 2 years ago
- ☆132Updated 3 months ago
- A template repository to create a data project with IAC, CI/CD, Data migrations, & testing☆262Updated 10 months ago