GoogleCloudPlatform / dataproc-pubsub-spark-streaming
☆31Updated 6 years ago
Alternatives and similar repositories for dataproc-pubsub-spark-streaming:
Users that are interested in dataproc-pubsub-spark-streaming are comparing it to the libraries listed below
- ☆46Updated 8 months ago
- Sample code with integration between Data Catalog and Hive data source.☆25Updated this week
- Example Spark applications that run on Kubernetes and access GCP products, e.g., GCS, BigQuery, and Cloud PubSub☆37Updated 6 years ago
- Scalable CDC Pattern Implemented using PySpark☆18Updated 5 years ago
- ☆65Updated 5 months ago
- Cloud Spanner Connector for Apache Spark☆17Updated 3 weeks ago
- Multi Cloud Data Tokenization Solution By Using Dataflow and Cloud DLP☆90Updated 5 months ago
- ☆127Updated 9 months ago
- Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.☆70Updated last year
- Apache Airflow CI pipeline☆19Updated 5 years ago
- Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service☆65Updated 8 months ago
- Mirror of Apache Beam☆10Updated 4 years ago
- Cloud Dataproc: Samples and Utils☆200Updated 2 weeks ago
- Commons code used by the Data Catalog connectors, and links for the connectors sample code.☆61Updated 3 years ago
- A Giter8 template for scio☆30Updated 2 months ago
- These are some code examples☆55Updated 5 years ago
- Stream Avro SpecificRecord objects in BigQuery using Cloud Dataflow☆13Updated 3 years ago
- Dataproc templates and pipelines for solving simple in-cloud data tasks☆123Updated 2 weeks ago
- Nested Data (JSON/AVRO/XML) Parsing and Flattening in Spark☆16Updated last year
- ☆81Updated last year
- Magic to help Spark pipelines upgrade☆34Updated 4 months ago
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.☆75Updated 9 months ago
- Data validation library for PySpark 3.0.0☆34Updated 2 years ago
- Template for Spark Projects☆101Updated 8 months ago
- This library has moved to https://github.com/googleapis/google-cloud-python/tree/main/packages/google-cloud-dataproc☆48Updated last year
- Automatically discover and tag PII data across BigQuery tables and apply column-level access controls based on confidentiality level.☆47Updated last week
- Sample processing code using Spark 2.1+ and Scala☆51Updated 4 years ago
- This is the support code and solutions for the NYC Taxi Tycoon Dataflow Codelab☆60Updated 5 years ago
- Spark ETL example processing New York taxi rides public dataset on EKS☆44Updated 2 years ago
- Rokku project. This project acts as a proxy on top of any S3 storage solution providing services like authentication, authorization, shor…☆66Updated 11 months ago