provectus / streaming-data-platform
☆24Updated 2 years ago
Alternatives and similar repositories for streaming-data-platform:
Users that are interested in streaming-data-platform are comparing it to the libraries listed below
- Swiss Army Kube (SAK) is an open-source IaC (Infrastructure as Code) collection of services for quick, easy, and controllable deployment …☆149Updated last year
- Reference Dockerfiles for production usage☆24Updated 5 years ago
- Data Quality Gate based on AWS☆57Updated 7 months ago
- 🚀 Deploy Kubeflow on AWS EKS with Terraform 🤖☆64Updated 2 years ago
- ITSumma Spark Greenplum Connector☆36Updated 10 months ago
- Airflow declarative DAGs via YAML☆132Updated last year
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Updated 3 years ago
- A tool to create Airflow RBAC roles with dag-level permissions from cli.☆13Updated last year
- Rokku project. This project acts as a proxy on top of any S3 storage solution providing services like authentication, authorization, shor…☆66Updated last year
- For a series of posts on Amazon MSK, Amazon EKS, and Amazon EMR☆65Updated 3 years ago
- Data validation library for PySpark 3.0.0☆34Updated 2 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆88Updated 11 months ago
- Streaming ETL with Apache Flink and Amazon Kinesis Data Analytics☆65Updated last year
- Data Engineering Digest☆27Updated 7 months ago
- Enables synchronizing metadata changes (Create/Drop table/partition) from Hive Metastore to AWS Glue Data Catalog☆35Updated last year
- Spark ETL example processing New York taxi rides public dataset on EKS☆44Updated 2 years ago
- Reference architecture for real-time stream processing with Apache Flink on Amazon EMR, Amazon Kinesis, and Amazon Elasticsearch Service.☆71Updated last year
- A Kafka Connect Source Connector for DynamoDB☆59Updated 9 months ago
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work☆48Updated 2 years ago
- Quickstart PySpark with Anaconda on AWS/EMR using Terraform☆47Updated last month
- ☆47Updated 6 months ago
- Automated data quality suggestions and analysis with Deequ on AWS Glue☆84Updated 2 years ago
- Deploy Presto on the cloud easily, using Terraform and Packer☆44Updated last year
- ETLy is an add-on dashboard service on top of Apache Airflow.☆69Updated last year
- ☆25Updated 5 months ago
- Spark to Tableau Extractor library☆18Updated 7 years ago
- Setup for running Trino with Hive Metastore on Kubernetes☆99Updated 2 years ago
- A best practices guide for using AWS EMR. The guide will cover best practices on the topics of cost, performance, security, operational e…☆103Updated 2 months ago
- A CLI to manage and monitor permissions in AWS Lake Formation☆26Updated 2 years ago
- The open source version of the Amazon Redshift Cluster Management Guide.☆48Updated last year