Cloud Dataproc: Samples and Utils
☆204Mar 17, 2026Updated last month
Alternatives and similar repositories for cloud-dataproc
Users that are interested in cloud-dataproc are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Run in all nodes of your cluster before the cluster starts - lets you customize your cluster☆597Apr 24, 2026Updated last week
- Cloud Spanner Connector for Apache Spark☆18Apr 24, 2026Updated last week
- dbt module for myBI connect☆13Jan 31, 2023Updated 3 years ago
- BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.☆422Updated this week
- Cloud ML Engine repo. Please visit the new Vertex AI samples repo at https://github.com/GoogleCloudPlatform/vertex-ai-samples☆1,544Dec 17, 2021Updated 4 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.☆289Updated this week
- An introduction to Jupyter and Jupyter Labs for data analysis, data science, and Python development☆14Oct 13, 2018Updated 7 years ago
- Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017☆1,428Feb 20, 2026Updated 2 months ago
- Google Cloud Dataflow pipelines such as Identity-By-State as well as useful utility classes.☆37Aug 9, 2023Updated 2 years ago
- This library has moved to https://github.com/googleapis/google-cloud-python/tree/main/packages/google-cloud-bigquery-datatransfer☆84Sep 29, 2023Updated 2 years ago
- Ephemeral Hadoop clusters using Google Compute Platform☆136Mar 31, 2022Updated 4 years ago
- Examples of how to use Cloud Bigtable both with GCE map/reduce as well as stand alone applications.☆235Mar 25, 2026Updated last month
- Commons code used by the Data Catalog connectors, and links for the connectors sample code.☆61Nov 24, 2021Updated 4 years ago
- Common solutions and tools developed by Google Cloud's Professional Services team. This repository and its contents are not an officially…☆3,019Updated this week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Source Code for 'Beginning Apache Spark 3' by Hien Luu☆13Oct 14, 2021Updated 4 years ago
- ☆14May 27, 2022Updated 3 years ago
- Batch Processing , orchestration using Apache Airflow and Google Workflows, spark structured Streaming and a lot more☆18Jun 21, 2022Updated 3 years ago
- ☆27May 1, 2024Updated 2 years ago
- Example Kubernetes app that shows how to build a 'pipeline' to stream data into BigQuery. Uses Redis or Google Cloud PubSub☆131Oct 20, 2020Updated 5 years ago
- Labs and demos for courses for GCP Training (http://cloud.google.com/training).☆8,507Updated this week
- Spark* shuffle plugin for support shuffling data through a remote Hadoop-compatible file system, as opposed to vanilla Spark's local-dis…☆21Mar 15, 2024Updated 2 years ago
- Sample code with integration between Data Catalog and RDBMS data sources.☆71Dec 6, 2021Updated 4 years ago
- ☆277Jun 1, 2016Updated 9 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Analysis of City Of Chicago Taxi Trip Dataset Using AWS EMR, Spark, PySpark, Zeppelin and Airbnb's Superset☆15Jul 16, 2017Updated 8 years ago
- An example that shows how to periodically launch a Dataflow analytics pipeline from GAE Flex, that reads from Datastore.☆42Oct 24, 2017Updated 8 years ago
- Code samples for using Python on Google Cloud Platform☆816Apr 13, 2026Updated 2 weeks ago
- Demo Codes will be shared here☆52Nov 19, 2025Updated 5 months ago
- Google Datalab Library☆192Sep 2, 2022Updated 3 years ago
- ☆17Mar 25, 2026Updated last month
- In this workshop you will launch an Amazon Redshift cluster in your AWS account and load sample data ~ 100GB using TPCH dataset. You will…☆24Nov 28, 2018Updated 7 years ago
- ☆89Mar 25, 2026Updated last month
- An Apache Spark app for making data movement between Apache Hive and Apache Phoenix/HBase☆14Mar 23, 2016Updated 10 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Albis: High-Performance File Format for Big Data Systems☆21Jul 12, 2018Updated 7 years ago
- Custom Google Spreadsheet functions using the Looker API☆27Feb 15, 2018Updated 8 years ago
- Input pipeline framework☆990Aug 6, 2025Updated 8 months ago
- Integration of TensorFlow with other open-source frameworks☆1,376Sep 25, 2024Updated last year
- Example Spark applications that run on Kubernetes and access GCP products, e.g., GCS, BigQuery, and Cloud PubSub☆37Feb 13, 2018Updated 8 years ago
- Get introduced to Directed Acyclic Graphs (DAGs) through Dagster with a simple ML program☆13Apr 19, 2023Updated 3 years ago
- Google Cloud Datalab samples and documentation☆336Sep 2, 2022Updated 3 years ago