tfayyaz / cloud-dataprocLinks
Cloud Dataproc: Samples and Utils
☆11Updated 4 years ago
Alternatives and similar repositories for cloud-dataproc
Users that are interested in cloud-dataproc are comparing it to the libraries listed below
Sorting:
- Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service☆68Updated last year
- Data lake, data warehouse on GCP☆56Updated 3 years ago
- CICD pipeline that deploys a dbt image on a GKE cluster☆11Updated 3 years ago
- dbt (data build tool) projects targeting AWS analytics services (redshift, glue, emr, athena) and open table formats☆29Updated 2 years ago
- ☆47Updated last year
- Dataproc templates and pipelines for solving in-cloud data tasks☆129Updated this week
- Sample code with integration between Data Catalog and Hive data source.☆24Updated 4 months ago
- ☆80Updated 7 months ago
- The go to demo for public and private dbt Learn☆77Updated 2 months ago
- ☆128Updated last year
- Data Catalog Tag Templates☆30Updated 3 weeks ago
- Covid19 and Iowa Liquor Sales analysis at BigQuery using dbt, Airflow, Marquez, Google Cloud and other modern data stack tools☆14Updated 2 years ago
- Cloned by the `dbt init` task☆61Updated last year
- Source code for the MC technical blog post "Data Observability in Practice Using SQL"☆38Updated 10 months ago
- For a series of posts on Amazon MSK, Amazon EKS, and Amazon EMR☆66Updated 3 years ago
- How to Automate SQL: dbt(data build tool) tutorial on bigquery with extensive NOTES☆32Updated last year
- dbt / Amazon Redshift Demonstration Project☆34Updated 2 years ago
- 📆 Run, schedule, and manage your dbt jobs using Kubernetes.☆24Updated 6 years ago
- Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python☆44Updated 2 years ago
- ☆22Updated last month
- A bunch of hacks developed around dbt☆48Updated 5 years ago
- Ingesting data with Pulumi, AWS lambdas and Snowflake in a scalable, fully replayable manner☆71Updated 3 years ago
- Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for A…☆41Updated 2 years ago
- A series of Jupyter notebooks that walk you through Machine Learning with Apache Spark ecosystem using Spark MLlib, PyTorch and TensorFlo…☆82Updated last year
- Run dbt serverless in the Cloud (AWS)☆42Updated 5 years ago
- Rules based grant management for Snowflake☆40Updated 6 years ago
- Cloud Build for Deploying Datapipelines with Composer, Dataflow and BigQuery☆64Updated 4 years ago
- ☆34Updated 2 years ago
- Quickstart PySpark with Anaconda on AWS/EMR using Terraform☆47Updated 4 months ago
- ☆38Updated 4 years ago