tfayyaz / cloud-dataprocLinks
Cloud Dataproc: Samples and Utils
☆11Updated 5 years ago
Alternatives and similar repositories for cloud-dataproc
Users that are interested in cloud-dataproc are comparing it to the libraries listed below
Sorting:
- ☆130Updated last year
- Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service☆74Updated last year
- Demo assets for DAIS 2021 'Learn to use Databricks for the full ML lifecycle' Talk☆14Updated 4 years ago
- ☆144Updated 10 months ago
- ☆90Updated 2 years ago
- Cloud Dataproc: Samples and Utils☆205Updated 4 months ago
- CICD pipeline that deploys a dbt image on a GKE cluster☆11Updated 4 years ago
- ☆24Updated 2 years ago
- Data Engineering with Spark and Delta Lake☆104Updated 2 years ago
- Building Big Data Pipelines with Apache Beam, published by Packt☆87Updated 2 years ago
- Dataproc templates and pipelines for solving in-cloud data tasks☆134Updated last month
- dbt (data build tool) projects targeting AWS analytics services (redshift, glue, emr, athena) and open table formats☆30Updated 2 years ago
- ☆42Updated 5 years ago
- Quickstart PySpark with Anaconda on AWS/EMR using Terraform☆47Updated 9 months ago
- ☆103Updated 9 months ago
- Cloud-native, data onboarding architecture for Google Cloud Datasets☆165Updated 2 months ago
- Repository of sample Databricks notebooks☆269Updated last year
- Speech Analysis Framework, a collection of components and code from Google Cloud that you can use to transcribe audio files to create ana…☆72Updated last year
- Repository for Beam College sessions☆110Updated 4 years ago
- Sample code with integration between Data Catalog and RDBMS data sources.☆72Updated 3 years ago
- PySpark data-pipeline testing and CICD☆28Updated 4 years ago
- This repo contains live examples to build Databricks' Lakehouse and recommended best practices from the field.☆22Updated last year
- Don't Panic. This guide will help you when it feels like the end of the world.☆28Updated last month
- Keep your local python scripts installed and in sync with a databricks notebook. Shortens the feedback loop to develop projects using a h…☆16Updated 4 months ago
- Source code for the YouTube video, Apache Beam Explained in 12 Minutes☆21Updated 4 years ago
- PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it☆75Updated 5 months ago
- Interactive Notebooks that support the book☆40Updated 4 years ago
- ☆66Updated this week
- For a series of posts on Amazon MSK, Amazon EKS, and Amazon EMR☆67Updated 3 years ago
- Multi Cloud Data Tokenization Solution By Using Dataflow and Cloud DLP☆95Updated last year