dsynkov / spark-livy-on-airflow-workspace
A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.
☆39Updated 3 years ago
Related projects: ⓘ
- Simple repo to demonstrate how to submit a spark job to EMR from Airflow☆31Updated 3 years ago
- Docker with Airflow and Spark standalone cluster☆239Updated last year
- Tutorial for setting up a Spark cluster running inside of Docker containers located on different machines☆121Updated last year
- Simple stream processing pipeline☆89Updated 3 months ago
- This repo contains a spark standalone cluster on docker for anyone who wants to play with PySpark by submitting their applications.☆22Updated last year
- Materials of the Official Helm Chart Webinar☆26Updated 3 years ago
- ☆38Updated this week
- ☆30Updated last year
- Learn how to add data validation and documentation to a data pipeline built with dbt and Airflow.☆167Updated 10 months ago
- A repository of sample code to show data quality checking best practices using Airflow.☆71Updated last year
- Resources for video demonstrations and blog posts related to DataOps on AWS☆166Updated 2 years ago
- Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python☆43Updated last year
- Course notes for the Astronomer Certification DAG Authoring for Apache Airflow☆44Updated 5 months ago
- Airflow training for the crunch conf☆105Updated 5 years ago
- Execution of DBT models using Apache Airflow through Docker Compose☆111Updated last year
- Delta Lake Documentation☆45Updated 3 months ago
- Delta-Lake, ETL, Spark, Airflow☆42Updated last year
- Apache Airflow in Docker Compose (for both versions 1.10.* and 2.*)☆184Updated 9 months ago
- Public source code for the Udemy online course Apache Airflow: Complete Hands-On Beginner to Advanced Class.☆61Updated 3 years ago
- ☆22Updated 3 years ago
- Docker Airflow - Contains a docker compose file for Airflow 2.0☆56Updated 2 years ago
- PySpark data-pipeline testing and CICD☆28Updated 3 years ago
- Example of how to leverage Apache Spark distributed capabilities to call REST-API using a UDF☆47Updated last year
- Code for dbt tutorial☆138Updated 3 months ago
- An example of an ETL pipeline that lays out generic DE processes. This is now out of date but still provides useful information☆26Updated 2 years ago
- ☆100Updated last month
- Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.☆445Updated last year
- Data pipeline with dbt, Airflow, Great Expectations☆155Updated 3 years ago
- (project & tutorial) dag pipeline tests + ci/cd setup☆84Updated 3 years ago
- Apche Spark Structured Streaming with Kafka using Python(PySpark)☆41Updated 5 years ago