PacktPublishing / Building-Big-Data-Pipelines-with-Apache-Beam
Building Big Data Pipelines with Apache Beam, published by Packt
☆81Updated last year
Related projects: ⓘ
- Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python☆43Updated last year
- Batch Processing , orchestration using Apache Airflow and Google Workflows, spark structured Streaming and a lot more☆19Updated 2 years ago
- ☆126Updated 4 months ago
- CICD pipeline that deploys a dbt image on a GKE cluster☆11Updated 3 years ago
- ☆122Updated 4 months ago
- Repository for Beam College sessions☆101Updated 3 years ago
- ☆36Updated 2 years ago
- PySpark data-pipeline testing and CICD☆28Updated 3 years ago
- Dataproc templates and pipelines for solving simple in-cloud data tasks☆116Updated this week
- Source code for the YouTube video, Apache Beam Explained in 12 Minutes☆20Updated 3 years ago
- Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for A…☆41Updated 2 years ago
- Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service☆63Updated 4 months ago
- Code Repository for AWS Certified Big Data Specialty 2019 - In Depth and Hands On!, published by Packt☆38Updated 10 months ago
- ☆32Updated 3 months ago
- ☆18Updated 5 years ago
- Code Repository for GCP: Complete Google Data Engineer and Cloud Architect Guide(v), Published by Packt☆16Updated last year
- ☆38Updated this week
- Interactive Notebooks that support the book☆38Updated 3 years ago
- Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation,…☆89Updated 2 years ago
- Apache Beam Python examples and templates.☆14Updated last year
- Spark data pipeline that processes movie ratings data.☆26Updated last month
- Apache Beam examples for running on Google Cloud Dataflow.☆30Updated 6 years ago
- This repository contains code for Spark Streaming☆21Updated 3 years ago
- A Data Mesh proof-of-concept built on Confluent Cloud☆2Updated last year
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆39Updated 3 years ago
- Example of how to leverage Apache Spark distributed capabilities to call REST-API using a UDF☆47Updated last year
- For a series of posts on Amazon MSK, Amazon EKS, and Amazon EMR☆65Updated 2 years ago
- Delta Lake Documentation☆45Updated 3 months ago
- How to build an awesome data engineering team☆99Updated 5 years ago
- Build a real-time website analytics dashboard on GCP using Dataflow, Cloud Memorystore (Redis) and Spring Boot☆27Updated 2 weeks ago