tfayyaz / cloud-dataproc
Cloud Dataproc: Samples and Utils
☆11Updated 4 years ago
Alternatives and similar repositories for cloud-dataproc:
Users that are interested in cloud-dataproc are comparing it to the libraries listed below
- ☆128Updated last year
- Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service☆69Updated last year
- CICD pipeline that deploys a dbt image on a GKE cluster☆11Updated 3 years ago
- ☆11Updated last year
- Dataproc templates and pipelines for solving in-cloud data tasks☆128Updated last month
- ☆61Updated 2 weeks ago
- Multi Cloud Data Tokenization Solution By Using Dataflow and Cloud DLP☆92Updated 8 months ago
- ☆25Updated 4 years ago
- Cloud Dataproc: Samples and Utils☆203Updated last month
- Interactive Notebooks that support the book☆40Updated 4 years ago
- ☆47Updated last year
- A series of Jupyter notebooks that walk you through Machine Learning with Apache Spark ecosystem using Spark MLlib, PyTorch and TensorFlo…☆81Updated last year
- Data Catalog Tag Templates☆30Updated 6 months ago
- This repo contains live examples to build Databricks' Lakehouse and recommended best practices from the field.☆19Updated 6 months ago
- Automatically discover and tag PII data across BigQuery tables and apply column-level access controls based on confidentiality level.☆55Updated 2 weeks ago
- Apache Beam starter repo for Python☆19Updated this week
- Materials of the Official Helm Chart Webinar☆27Updated 3 years ago
- [DEPRECATED] GAE python based app which regularly collects information about GCP resources and stores them in BigQuery☆45Updated last year
- Source code for the MC technical blog post "Data Observability in Practice Using SQL"☆38Updated 9 months ago
- Sample code with integration between Data Catalog and BI data sources.☆32Updated 3 years ago
- ☆20Updated 5 years ago
- Building Big Data Pipelines with Apache Beam, published by Packt☆86Updated 2 years ago
- ☆84Updated 2 years ago
- Deploys a secured BigQuery data warehouse☆83Updated 3 weeks ago
- ☆137Updated 5 months ago
- Sample Airflow DAGs to load data from the CovidTracking API to Snowflake via an AWS S3 intermediary.☆16Updated 4 years ago
- Sample code with integration between Data Catalog and Hive data source.☆25Updated 3 months ago
- Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python☆43Updated 2 years ago
- Machine Learning in Snowflake☆24Updated 5 years ago
- AWS Big Data Certification☆25Updated 4 months ago