provectus / streaming-data-platform
☆24Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for streaming-data-platform
- Reference Dockerfiles for production usage☆24Updated 4 years ago
- Swiss Army Kube (SAK) is an open-source IaC (Infrastructure as Code) collection of services for quick, easy, and controllable deployment …☆149Updated last year
- Data Quality Gate based on AWS☆57Updated 4 months ago
- ITSumma Spark Greenplum Connector☆34Updated 7 months ago
- Airflow declarative DAGs via YAML☆131Updated last year
- 🚀 Deploy Kubeflow on AWS EKS with Terraform 🤖☆64Updated last year
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Updated 3 years ago
- Data Engineering Digest☆27Updated 4 months ago
- Data validation library for PySpark 3.0.0☆34Updated 2 years ago
- Rokku project. This project acts as a proxy on top of any S3 storage solution providing services like authentication, authorization, shor…☆66Updated 9 months ago
- Spark ETL example processing New York taxi rides public dataset on EKS☆44Updated last year
- For a series of posts on Amazon MSK, Amazon EKS, and Amazon EMR☆65Updated 2 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆86Updated 8 months ago
- MLOps Platform☆271Updated 3 weeks ago
- Spark on Kubernetes infrastructure Docker images repo☆37Updated 2 years ago
- ☆40Updated last year
- ODD Specification is a universal open standard for collecting metadata.☆129Updated 3 weeks ago
- ☆78Updated last year
- Aiven's collection of Single Message Transformations (SMTs) for Apache Kafka Connect☆74Updated 2 weeks ago
- Enables synchronizing metadata changes (Create/Drop table/partition) from Hive Metastore to AWS Glue Data Catalog☆33Updated 11 months ago
- Deploy Presto on the cloud easily, using Terraform and Packer☆44Updated last year
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆96Updated last year
- Setup for running Trino with Hive Metastore on Kubernetes☆98Updated 2 years ago
- Nested array transformation helper extensions for Apache Spark☆36Updated last year
- The Workload Analyzer collects Presto® and Trino workload statistics, and analyzes them☆135Updated last year
- AWS Glue Schema Registry Client library provides serializers / de-serializers for applications to integrate with AWS Glue Schema Registry…☆131Updated last week
- A tool to create Airflow RBAC roles with dag-level permissions from cli.☆13Updated last year
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work☆47Updated 2 years ago
- Curated list of resources about Apache Airflow☆19Updated 3 years ago