datastacktv / apache-beam-explained
Source code for the YouTube video, Apache Beam Explained in 12 Minutes
☆20Updated 3 years ago
Related projects: ⓘ
- ☆18Updated 5 years ago
- Public source code for the Batch Processing with Apache Beam (Python) online course☆17Updated 3 years ago
- Building Big Data Pipelines with Apache Beam, published by Packt☆81Updated last year
- CICD pipeline that deploys a dbt image on a GKE cluster☆11Updated 3 years ago
- Code Repository for AWS Certified Big Data Specialty 2019 - In Depth and Hands On!, published by Packt☆38Updated 10 months ago
- Quickstart PySpark with Anaconda on AWS/EMR using Terraform☆47Updated 9 months ago
- Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python☆43Updated last year
- markup to create labs for courses from the Google Cloud training catalog.☆49Updated 2 years ago
- AWS Big Data Certification☆24Updated last year
- Supplementary material for Building a Modern Data Platform with Snowflake, from Pearson.☆21Updated 2 years ago
- Sample Airflow DAGs☆60Updated last year
- Sample Airflow DAGs to load data from the CovidTracking API to Snowflake via an AWS S3 intermediary.☆16Updated 3 years ago
- ☆32Updated 3 months ago
- Cloud Dataproc: Samples and Utils☆11Updated 3 years ago
- Code Repository for GCP: Complete Google Data Engineer and Cloud Architect Guide(v), Published by Packt☆16Updated last year
- Code examples for the Introduction to Kubeflow course☆13Updated 3 years ago
- Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service☆63Updated 4 months ago
- This is a real-life, high throughput streaming ELT data pipeline for ecommerce☆13Updated last year
- Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for A…☆41Updated 2 years ago
- ☆46Updated 4 months ago
- A Gentle introduction to Machine Learning with Apache Spark☆11Updated last year
- Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation,…☆89Updated 2 years ago
- ☆126Updated 4 months ago
- ☆37Updated 6 months ago
- ☆42Updated 4 years ago
- A Pyspark job to handle upserts, conversion to parquet and create partitions on S3☆26Updated 4 years ago
- Amazon EMR Serverless and Amazon MSK Serverless Demo☆13Updated 2 years ago
- Data lake, data warehouse on GCP☆54Updated 2 years ago
- ☆33Updated 10 months ago
- Docker envinroment to stream data from Kafka to Iceberg tables☆24Updated 6 months ago