provectus / streaming-data-platformLinks
☆24Updated 2 years ago
Alternatives and similar repositories for streaming-data-platform
Users that are interested in streaming-data-platform are comparing it to the libraries listed below
Sorting:
- Reference Dockerfiles for production usage☆24Updated 5 years ago
- Data Quality Gate based on AWS☆56Updated 11 months ago
- Swiss Army Kube (SAK) is an open-source IaC (Infrastructure as Code) collection of services for quick, easy, and controllable deployment …☆148Updated this week
- ITSumma Spark Greenplum Connector☆38Updated last year
- 🚀 Deploy Kubeflow on AWS EKS with Terraform 🤖☆64Updated 2 years ago
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Updated 3 years ago
- Data validation library for PySpark 3.0.0☆33Updated 2 years ago
- Airflow declarative DAGs via YAML☆132Updated last year
- Spark on Kubernetes infrastructure Docker images repo☆37Updated 2 years ago
- Amundsen Gremlin☆21Updated 2 years ago
- A K8s-based infrastructure for analytics☆24Updated 5 years ago
- A tool to create Airflow RBAC roles with dag-level permissions from cli.☆13Updated last year
- Rokku project. This project acts as a proxy on top of any S3 storage solution providing services like authentication, authorization, shor…☆69Updated 4 months ago
- This Java library has been designed to facilitate leader election within Kafka clusters providing an efficient and robust solution for di…☆25Updated 2 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆88Updated last year
- The Internals of Spark on Kubernetes☆71Updated 3 years ago
- Data Engineering Digest☆28Updated last year
- Pylint plugin for static code analysis on Airflow code☆95Updated 4 years ago
- Examples for using Apache Flink® with DataStream API, Table API, Flink SQL and connectors such as MySQL, JDBC, CDC, Kafka.☆64Updated last year
- Apache Spark in your IDE with gradle☆38Updated 4 years ago
- Minikube for big data with Scala and Spark☆15Updated 5 years ago
- ☆14Updated 3 weeks ago
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes☆64Updated 3 years ago
- Aiven's collection of Single Message Transformations (SMTs) for Apache Kafka Connect☆79Updated 3 weeks ago
- Basic framework utilities to quickly start writing production ready Apache Spark applications☆36Updated 6 months ago
- Deploy Presto on the cloud easily, using Terraform and Packer☆45Updated 2 years ago
- Scalable CDC Pattern Implemented using PySpark☆18Updated 5 years ago
- Spark to Tableau Extractor library☆18Updated 7 years ago
- A best practices guide for using AWS EMR. The guide will cover best practices on the topics of cost, performance, security, operational e…☆107Updated last week
- Task Metrics Explorer☆13Updated 6 years ago