GoogleCloudPlatform / oozie-to-airflow
Oozie Workflow to Airflow DAGs migration tool
☆88Updated 2 weeks ago
Alternatives and similar repositories for oozie-to-airflow:
Users that are interested in oozie-to-airflow are comparing it to the libraries listed below
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆88Updated last year
- The Internals of Spark on Kubernetes☆70Updated 2 years ago
- Task Metrics Explorer☆13Updated 5 years ago
- The Workload Analyzer collects Presto® and Trino workload statistics, and analyzes them☆135Updated last year
- Spline agent for Apache Spark☆191Updated 2 weeks ago
- The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog a…☆212Updated 10 months ago
- A simple Spark-powered ETL framework that just works 🍺☆181Updated 3 weeks ago
- A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator☆73Updated 5 years ago
- Sample code with integration between Data Catalog and Hive data source.☆25Updated last month
- Example Spark applications that run on Kubernetes and access GCP products, e.g., GCS, BigQuery, and Cloud PubSub☆37Updated 7 years ago
- Spark metrics related custom classes and sinks (e.g. Prometheus)☆180Updated 2 years ago
- Magic to help Spark pipelines upgrade☆34Updated 5 months ago
- Repository of helm charts for deploying DataHub on a Kubernetes cluster☆176Updated this week
- Setup for running Trino with Hive Metastore on Kubernetes☆100Updated 2 years ago
- A Spark metrics sink that pushes to InfluxDb☆51Updated 4 years ago
- Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive☆185Updated 2 years ago
- How to manage Slowly Changing Dimensions with Apache Hive☆55Updated 5 years ago
- Cask Hydrator Plugins Repository☆68Updated this week
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆97Updated 2 years ago
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Updated 3 years ago
- A tool to validate data, built around Apache Spark.☆101Updated this week
- An example Apache Beam project.☆111Updated 7 years ago
- Examples of Spark 3.0☆47Updated 4 years ago
- Pylint plugin for static code analysis on Airflow code☆93Updated 4 years ago
- ☆198Updated last year
- Multiple node presto cluster on docker container☆124Updated 2 years ago
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆120Updated 3 weeks ago
- A library that brings useful functions from various modern database management systems to Apache Spark☆58Updated last year
- Presto Trino with Apache Hive Postgres metastore☆40Updated 6 months ago
- Airflow workflow management platform chef cookbook.☆71Updated 5 years ago