dunnhumby / democratizing-dataprocLinks
Using terraform, deploy multiple dataproc clusters using a shared hive metastore
☆15Updated 3 years ago
Alternatives and similar repositories for democratizing-dataproc
Users that are interested in democratizing-dataproc are comparing it to the libraries listed below
Sorting:
- Documentation and implementation of telemetry ingestion on Google Cloud Platform☆85Updated last week
- Airflow configuration for Telemetry☆197Updated this week
- Multi Cloud Data Tokenization Solution By Using Dataflow and Cloud DLP☆95Updated last year
- ☆46Updated last year
- ☆54Updated 8 years ago
- Tools for creating Dataproc custom images☆35Updated 2 months ago
- Cloud-native, data onboarding architecture for Google Cloud Datasets☆168Updated last month
- Repository with examples and smoke tests for the GCP Airflow operators and hooks☆152Updated 8 years ago
- Builds Airflow DAGs from configuration files. Powers all DAGs on the Etsy Data Platform☆259Updated 2 years ago
- Creates opinionated BigQuery datasets and tables☆226Updated 2 weeks ago
- Cloud Build for Deploying Datapipelines with Composer, Dataflow and BigQuery☆64Updated 5 years ago
- Example Kubernetes app that shows how to build a 'pipeline' to stream data into BigQuery. Uses Redis or Google Cloud PubSub☆131Updated 5 years ago
- Opinion Analysis of News, Threaded Conversations, and User Generated Content☆106Updated last year
- Quickly get a kubernetes executor airflow environment provisioned on GKE. Azure Kubernetes Service instructions included also as are inst…☆36Updated 5 years ago
- Airflow workflow management platform chef cookbook.☆71Updated 6 years ago
- Ephemeral Hadoop clusters using Google Compute Platform☆134Updated 3 years ago
- Data pipeline is a tool to run Data loading pipelines. It is an open sourced app engine app that users can extend to suit their own needs…☆87Updated 11 years ago
- GCP Plugin for Gordon: Event-driven Cloud DNS☆12Updated 2 years ago
- Metadata service library for Amundsen☆82Updated this week
- This is the support code and solutions for the NYC Taxi Tycoon Dataflow Codelab☆63Updated 6 years ago
- Uses Cloud Build to deploy a scalable batch ingestion pipeline consisting of GCS, Cloud Functions, Dataflow and BigQuery☆22Updated 2 years ago
- Data models for snowplow analytics.☆129Updated 9 months ago
- Manages Cloud Composer v1 and v2 along with option to manage networking☆54Updated 2 weeks ago
- Reference framework for building data workflows provided by Google. Accelerates authentication, logging, scheduling, and deployment of s…☆173Updated last year
- Unit and integration testing with PySpark can be tough to figure out, let's make that easier.☆23Updated 10 years ago
- Data ingestion library for Amundsen to build graph and search index☆204Updated last year
- Example stream processing job, written in Scala with Apache Beam, for Google Cloud Dataflow☆30Updated 8 years ago
- Replicates data between Google Cloud BigQuery projects☆22Updated 9 years ago
- Cloudera Director sample code☆61Updated 6 years ago
- locopy: Loading/Unloading to Redshift and Snowflake using Python.☆114Updated 4 months ago