GoogleCloudDataproc / custom-images
Tools for creating Dataproc custom images
☆32Updated this week
Alternatives and similar repositories for custom-images:
Users that are interested in custom-images are comparing it to the libraries listed below
- Data validation library for PySpark 3.0.0☆34Updated 2 years ago
- Sample code with integration between Data Catalog and Hive data source.☆25Updated this week
- An example PySpark project with pytest☆17Updated 7 years ago
- hive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.☆28Updated 7 years ago
- ☆46Updated 8 months ago
- Magic to help Spark pipelines upgrade☆34Updated 4 months ago
- The Internals of Spark on Kubernetes☆70Updated 2 years ago
- Cloud Spanner Connector for Apache Spark☆17Updated 3 weeks ago
- Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service☆65Updated 8 months ago
- A tool to create Airflow RBAC roles with dag-level permissions from cli.☆13Updated last year
- Rules based grant management for Snowflake☆40Updated 5 years ago
- Cloud Dataproc: Samples and Utils☆200Updated 2 weeks ago
- ☆54Updated 7 years ago
- Oozie Workflow to Airflow DAGs migration tool☆88Updated last month
- Spark on Kubernetes using Helm☆34Updated 4 years ago
- This library has moved to https://github.com/googleapis/google-cloud-python/tree/main/packages/google-cloud-dataproc☆48Updated last year
- Stream Avro SpecificRecord objects in BigQuery using Cloud Dataflow☆13Updated 3 years ago
- Visualize dependencies between Airflow DAGs☆49Updated 3 years ago
- Cask Hydrator Plugins Repository☆67Updated this week
- ☆19Updated last week
- PySpark for ETL jobs including lineage to Apache Atlas in one script via code inspection☆18Updated 8 years ago
- The open source version of the Amazon Redshift Cluster Management Guide.☆48Updated last year
- A Getting Started Guide for developing and using Airflow Plugins☆94Updated 6 years ago
- XGBoost GPU accelerated on Spark example applications☆52Updated 2 years ago
- Paper: A Zero-rename committer for object stores☆20Updated 3 years ago
- Examples of Spark 3.0☆46Updated 4 years ago
- A curated list of awesome resources for Apache Beam☆146Updated 2 years ago
- This is the example code repository for Getting Started with Impala by John Russell (O'Reilly Media)☆22Updated 7 years ago
- PySpark data-pipeline testing and CICD☆28Updated 4 years ago
- The go to demo for public and private dbt Learn☆74Updated 4 months ago