Run in all nodes of your cluster before the cluster starts - lets you customize your cluster
☆600Jan 23, 2026Updated last month
Alternatives and similar repositories for initialization-actions
Users that are interested in initialization-actions are comparing it to the libraries listed below
Sorting:
- Cloud Dataproc: Samples and Utils☆206Feb 20, 2026Updated last week
- BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.☆421Feb 19, 2026Updated last week
- [DEPRECATED] Script used to manage Hadoop and Spark instances on Google Compute Engine☆109Nov 15, 2019Updated 6 years ago
- Labs and demos for courses for GCP Training (http://cloud.google.com/training).☆8,460Feb 6, 2026Updated 3 weeks ago
- Common solutions and tools developed by Google Cloud's Professional Services team. This repository and its contents are not an officially…☆2,998Updated this week
- Cloud Dataflow Google-provided templates for solving in-Cloud data tasks☆1,273Updated this week
- ☆84Jan 26, 2026Updated last month
- Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017☆1,406Feb 20, 2026Updated last week
- Cloud ML Engine repo. Please visit the new Vertex AI samples repo at https://github.com/GoogleCloudPlatform/vertex-ai-samples☆1,538Dec 17, 2021Updated 4 years ago
- Google BigQuery support for Spark, SQL, and DataFrames☆155Dec 14, 2019Updated 6 years ago
- Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines. This re…☆167Jul 25, 2018Updated 7 years ago
- A Scala API for Apache Beam and Google Cloud Dataflow.☆2,615Feb 12, 2026Updated 2 weeks ago
- ☆31Oct 17, 2018Updated 7 years ago
- ☆14May 27, 2022Updated 3 years ago
- Repository with examples and smoke tests for the GCP Airflow operators and hooks☆152Jan 15, 2017Updated 9 years ago
- Opinion Analysis of News, Threaded Conversations, and User Generated Content☆108Sep 19, 2024Updated last year
- Cloud Spanner Connector for Apache Spark☆17Feb 13, 2026Updated 2 weeks ago
- Processing Logs at Scale using Cloud Dataflow☆62Mar 18, 2019Updated 6 years ago
- Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.☆164May 31, 2017Updated 8 years ago
- ☆278Jun 1, 2016Updated 9 years ago
- Google Cloud Client Library for Python☆5,219Updated this week
- Uses Cloud Build to deploy a scalable batch ingestion pipeline consisting of GCS, Cloud Functions, Dataflow and BigQuery☆22Dec 7, 2022Updated 3 years ago
- Code samples used on cloud.google.com☆7,981Updated this week
- Apache Beam is a unified programming model for Batch and Streaming data processing.☆8,492Updated this week
- Example Kubernetes app that shows how to build a 'pipeline' to stream data into BigQuery. Uses Redis or Google Cloud PubSub☆131Oct 20, 2020Updated 5 years ago
- A docker image and kubernetes config files to run Airflow on Kubernetes☆655Jul 19, 2019Updated 6 years ago
- ☆130Apr 24, 2024Updated last year
- A user-space file system for interacting with Google Cloud Storage☆2,205Updated this week
- Useful scripts, udfs, views, and other utilities for migration and data warehouse operations in BigQuery.☆1,278Feb 17, 2026Updated last week
- Machine Learning on Google Cloud Platform☆512Feb 20, 2026Updated last week
- Modular Google Compute Engine managed instance group for Terraform.☆61Apr 22, 2021Updated 4 years ago
- Builder images and examples commonly used for Google Cloud Build☆1,442Feb 12, 2026Updated 2 weeks ago
- Repository for streaming and batch samples of timeseries data☆26Feb 23, 2021Updated 5 years ago
- Open source tools for Google Cloud Storage and Databases.☆63May 1, 2024Updated last year
- Example Spark applications that run on Kubernetes and access GCP products, e.g., GCS, BigQuery, and Cloud PubSub☆37Feb 13, 2018Updated 8 years ago
- Google Cloud Pubsub connector for Spark Streaming☆17Oct 21, 2021Updated 4 years ago
- Using the Parquet file format (with Avro) to process data with Apache Flink☆14Aug 17, 2015Updated 10 years ago
- Dockerflow is a workflow runner that uses Dataflow to run a series of tasks in Docker with the Pipelines API☆101Nov 21, 2017Updated 8 years ago
- Uses Google Prediction API to label GitHub Issues as they are created.☆27Dec 5, 2018Updated 7 years ago