tfayyaz / cloud-dataproc
Cloud Dataproc: Samples and Utils
☆11Updated 4 years ago
Alternatives and similar repositories for cloud-dataproc:
Users that are interested in cloud-dataproc are comparing it to the libraries listed below
- Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service☆65Updated 8 months ago
- ☆127Updated 9 months ago
- CICD pipeline that deploys a dbt image on a GKE cluster☆11Updated 3 years ago
- Dataproc templates and pipelines for solving simple in-cloud data tasks☆123Updated 2 weeks ago
- Code Repository for GCP: Complete Google Data Engineer and Cloud Architect Guide(v), Published by Packt☆16Updated 2 years ago
- ☆129Updated 2 months ago
- Sample code with integration between Data Catalog and Hive data source.☆25Updated this week
- Source code for the MC technical blog post "Data Observability in Practice Using SQL"☆36Updated 6 months ago
- Stream Avro SpecificRecord objects in BigQuery using Cloud Dataflow☆13Updated 3 years ago
- A workshop with several modules to help learn Feast, an open-source feature store☆84Updated 3 weeks ago
- This repo is a collection of tools to deploy, manage and operate a Databricks based Lakehouse.☆44Updated this week
- markup to create labs for courses from the Google Cloud training catalog.☆49Updated 2 years ago
- ☆11Updated last year
- ☆73Updated 3 months ago
- ☆63Updated 2 weeks ago
- Source code for 'BigQuery for Data Warehousing' by Mark Mucchetti☆16Updated 4 years ago
- PySpark data-pipeline testing and CICD☆28Updated 4 years ago
- Data Engineering with Spark and Delta Lake☆94Updated 2 years ago
- Building Big Data Pipelines with Apache Beam, published by Packt☆84Updated last year
- Weekly Data Engineering Newsletter☆94Updated 6 months ago
- ☆26Updated 4 years ago
- Automatically discover and tag PII data across BigQuery tables and apply column-level access controls based on confidentiality level.☆47Updated last week
- Keep your local python scripts installed and in sync with a databricks notebook. Shortens the feedback loop to develop projects using a h…☆15Updated 3 weeks ago
- ☆84Updated last year
- Docker envinroment to stream data from Kafka to Iceberg tables☆24Updated 11 months ago
- ☆17Updated 2 years ago
- Source code for the YouTube video, Apache Beam Explained in 12 Minutes☆20Updated 4 years ago
- ☆46Updated 8 months ago
- Quickstart PySpark with Anaconda on AWS/EMR using Terraform☆47Updated 3 weeks ago
- ☆42Updated 4 years ago