jksinghpro / docker-airflow
Docker for airflow with mysql as backend
☆13Updated 5 years ago
Related projects: ⓘ
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆53Updated last year
- Sample Airflow DAGs☆60Updated last year
- This repository contains code for Spark Streaming☆21Updated 3 years ago
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆39Updated 3 years ago
- Airflow training for the crunch conf☆105Updated 5 years ago
- Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for A…☆41Updated 2 years ago
- ☆38Updated this week
- A repository of sample code to show data quality checking best practices using Airflow.☆71Updated last year
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- Building Json data pipeline within Snowflake using Streams and Tasks☆26Updated 4 years ago
- ETL pipeline using pyspark (Spark - Python)☆106Updated 4 years ago
- Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation,…☆89Updated 2 years ago
- Delta Lake Documentation☆45Updated 3 months ago
- Curated list of resources about Apache Airflow☆19Updated 3 years ago
- ☆26Updated 4 years ago
- ☆25Updated last year
- ☆14Updated 5 years ago
- Learn how to add data validation and documentation to a data pipeline built with dbt and Airflow.☆167Updated 10 months ago
- locopy: Loading/Unloading to Redshift and Snowflake using Python.☆104Updated last week
- PySpark data-pipeline testing and CICD☆28Updated 3 years ago
- Execution of DBT models using Apache Airflow through Docker Compose☆111Updated last year
- Spark data pipeline that processes movie ratings data.☆26Updated last month
- Spark app to merge different schemas☆23Updated 3 years ago
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work☆47Updated 2 years ago
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL database☆64Updated 3 years ago
- Building Big Data Pipelines with Apache Beam, published by Packt☆81Updated last year
- In this workshop you will launch an Amazon Redshift cluster in your AWS account and load sample data ~ 100GB using TPCH dataset. You will…☆24Updated 5 years ago
- CICD pipeline that deploys a dbt image on a GKE cluster☆11Updated 3 years ago
- Rules based grant management for Snowflake☆40Updated 5 years ago
- Resources for video demonstrations and blog posts related to DataOps on AWS☆166Updated 2 years ago