godatadriven-dockerhub / hive-metastore
Hadoop/Hive/Spark container to perform CI tests
☆11Updated 4 years ago
Alternatives and similar repositories for hive-metastore:
Users that are interested in hive-metastore are comparing it to the libraries listed below
- ☆13Updated this week
- ☆47Updated 6 months ago
- Scalable CDC Pattern Implemented using PySpark☆18Updated 5 years ago
- Examples of Spark 3.0☆46Updated 4 years ago
- Apiary provides modules which can be combined to create a federated cloud data lake☆36Updated 10 months ago
- Magic to help Spark pipelines upgrade☆34Updated 4 months ago
- Presto Trino with Apache Hive Postgres metastore☆39Updated 5 months ago
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆97Updated 2 years ago
- Sample code to collect Apache Iceberg metrics for table monitoring☆24Updated 5 months ago
- ☆25Updated 5 months ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆28Updated this week
- ☆40Updated last year
- Yet Another (Spark) ETL Framework☆18Updated last year
- Docker image for Apache Hive Metastore☆71Updated last year
- Data Profiler for AWS Glue Data Catalog application as described in the AWS Big Data Blog post "Build an automatic data profiling and rep…☆19Updated 4 years ago
- ☆27Updated last month
- The Internals of Spark on Kubernetes☆70Updated 2 years ago
- Multi-stage, config driven, SQL based ETL framework using PySpark☆25Updated 5 years ago
- Spark data pipeline that processes movie ratings data.☆27Updated 3 weeks ago
- Spark on Kubernetes using Helm☆34Updated 4 years ago
- ☆24Updated 5 months ago
- Oozie Workflow to Airflow DAGs migration tool☆88Updated last month
- Sample Airflow DAGs☆62Updated 2 years ago
- Example Spark applications that run on Kubernetes and access GCP products, e.g., GCS, BigQuery, and Cloud PubSub☆37Updated 7 years ago
- Spark ETL example processing New York taxi rides public dataset on EKS☆44Updated 2 years ago
- ☆94Updated last year
- ☆39Updated 5 years ago
- Demonstration of a Hive Input Format for Iceberg☆26Updated 3 years ago
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Updated 3 years ago
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆94Updated this week