GoogleCloudDataproc/hadoop-connectors

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/GoogleCloudDataproc/hadoop-connectors)

GoogleCloudDataproc / hadoop-connectors

Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.

☆292

Alternatives and similar repositories for hadoop-connectors

Users that are interested in hadoop-connectors are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

GoogleCloudDataproc / spark-bigquery-connector
View on GitHub
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
☆425Updated this week
GoogleCloudDataproc / initialization-actions
View on GitHub
Run in all nodes of your cluster before the cluster starts - lets you customize your cluster
☆597Jul 6, 2026Updated 2 weeks ago
spotify / spark-bigquery
View on GitHub
Google BigQuery support for Spark, SQL, and DataFrames
☆156Dec 14, 2019Updated 6 years ago
GoogleCloudDataproc / bdutil
View on GitHub
[DEPRECATED] Script used to manage Hadoop and Spark instances on Google Compute Engine
☆108Nov 15, 2019Updated 6 years ago
GoogleCloudDataproc / spark-spanner-connector
View on GitHub
Cloud Spanner Connector for Apache Spark
☆18Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
GoogleCloudPlatform / spark-on-k8s-gcp-examples
View on GitHub
Example Spark applications that run on Kubernetes and access GCP products, e.g., GCS, BigQuery, and Cloud PubSub
☆35Feb 13, 2018Updated 8 years ago
spotify / spydra
View on GitHub
Ephemeral Hadoop clusters using Google Compute Platform
☆136Mar 31, 2022Updated 4 years ago
GoogleCloudPlatform / cloud-bigtable-examples
View on GitHub
Examples of how to use Cloud Bigtable both with GCE map/reduce as well as stand alone applications.
☆236Mar 25, 2026Updated 3 months ago
googleapis / java-bigtable-hbase
View on GitHub
Java libraries and HBase client extensions for accessing Google Cloud Bigtable
☆182Updated this week
GoogleCloudPlatform / bigquery-ingest-avro-dataflow-sample
View on GitHub
Stream Avro SpecificRecord objects in BigQuery using Cloud Dataflow
☆13Jan 4, 2022Updated 4 years ago
GoogleCloudPlatform / bigquery-workflows-load
View on GitHub
Load data in BigQuery using Cloud Workflows, Firestore and Cloud Functions.
☆11May 12, 2021Updated 5 years ago
GoogleCloudPlatform / oozie-to-airflow
View on GitHub
Oozie Workflow to Airflow DAGs migration tool
☆93Jun 15, 2026Updated last month
GoogleCloudDataproc / cloud-dataproc
View on GitHub
Cloud Dataproc: Samples and Utils
☆205Jul 10, 2026Updated last week
wepay / kafka-connect-bigquery
View on GitHub
DEPRECATED. PLEASE USE https://github.com/confluentinc/kafka-connect-bigquery. A Kafka Connect BigQuery sink connector
☆151Mar 4, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
spotify / scio
View on GitHub
A Scala API for Apache Beam and Google Cloud Dataflow.
☆2,627Updated this week
data-integrations / wrangler
View on GitHub
Wrangler Transform: A DMD system for transforming Big Data
☆108Updated this week
kubeflow / spark-operator
View on GitHub
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
☆3,139Updated this week
googleapis / java-storage
View on GitHub
☆140May 20, 2026Updated 2 months ago
GoogleCloudPlatform / dataflow-sample-applications
View on GitHub
☆131Apr 24, 2024Updated 2 years ago
sksamuel / sbt-avro4s
View on GitHub
Sbt plugin for avro4s
☆20May 30, 2018Updated 8 years ago
GoogleCloudPlatform / professional-services
View on GitHub
Common solutions and tools developed by Google Cloud's Professional Services team. This repository and its contents are not an officially…
☆3,046Updated this week
GoogleCloudPlatform / DataflowTemplates
View on GitHub
Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
☆1,307Updated this week
malo-denielou / DataflowSME
View on GitHub
Tutorial for Cloud Dataflow
☆17Mar 12, 2019Updated 7 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
GoogleCloudPlatform / datacatalog-connectors-rdbms
View on GitHub
Sample code with integration between Data Catalog and RDBMS data sources.
☆72Dec 6, 2021Updated 4 years ago
knaufk / demo-beam-summit-2018
View on GitHub
Python Streaming Pipelines with Beam on Flink - Demo
☆13Dec 8, 2022Updated 3 years ago
GoogleCloudPlatform / gsutil
View on GitHub
A command line tool for interacting with cloud storage services.
☆918Updated this week
steveloughran / zero-rename-committer
View on GitHub
Paper: A Zero-rename committer for object stores
☆20Nov 7, 2025Updated 8 months ago
GoogleCloudDataproc / custom-images
View on GitHub
Tools for creating Dataproc custom images
☆34Jun 12, 2026Updated last month
googlearchive / billing-export-python
View on GitHub
View billing export files via an App Engine application dashboard.
☆20Jun 14, 2017Updated 9 years ago
apache / beam
View on GitHub
Apache Beam is a unified programming model for Batch and Streaming data processing.
☆8,635Updated this week
googleapis / python-dataproc
View on GitHub
This library has moved to https://github.com/googleapis/google-cloud-python/tree/main/packages/google-cloud-dataproc
☆49Sep 29, 2023Updated 2 years ago
apache / livy-website
View on GitHub
Mirror of Apache livy (Incubating)
☆13Jul 7, 2026Updated last week
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
GoogleCloudDataproc / jupyterhub-dataprocspawner
View on GitHub
☆14May 27, 2022Updated 4 years ago
samelamin / spark-bigquery
View on GitHub
Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
☆70May 8, 2023Updated 3 years ago
datumo / dataset-logger
View on GitHub
Apache Spark Scala utility to track data records during application execution
☆11Jun 12, 2023Updated 3 years ago
scalapb / protobuf-scala-runtime
View on GitHub
A re-impementation of some com.google.protobuf classes in Scala
☆13Updated this week
linkedin / dynamometer
View on GitHub
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
☆135Jan 11, 2024Updated 2 years ago
ptgoetz / hdfs-cli
View on GitHub
Interactive shell for interacting with Hadoop HDFS. Supports multiple HDFS hosts, command line history and tab completion.
☆30May 20, 2016Updated 10 years ago
mhausenblas / hadoop-data-ingestion
View on GitHub
Renders options for ingesting data into Hadoop
☆21Jun 18, 2013Updated 13 years ago