provectus / streaming-data-platform
☆24Updated 2 years ago
Alternatives and similar repositories for streaming-data-platform:
Users that are interested in streaming-data-platform are comparing it to the libraries listed below
- Reference Dockerfiles for production usage☆24Updated 5 years ago
- Data Quality Gate based on AWS☆56Updated 9 months ago
- Data Engineering Digest☆28Updated 10 months ago
- 🚀 Deploy Kubeflow on AWS EKS with Terraform 🤖☆64Updated 2 years ago
- Airflow declarative DAGs via YAML☆132Updated last year
- Deploy Presto on the cloud easily, using Terraform and Packer☆44Updated 2 years ago
- MLOps Platform☆270Updated 5 months ago
- ITSumma Spark Greenplum Connector☆37Updated last year
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆88Updated last year
- Rokku project. This project acts as a proxy on top of any S3 storage solution providing services like authentication, authorization, shor…☆66Updated 2 months ago
- A tool to create Airflow RBAC roles with dag-level permissions from cli.☆13Updated last year
- KSQL Syntax Highlighting for VSCode☆17Updated 2 years ago
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Updated 3 years ago
- Spark on Kubernetes infrastructure Docker images repo☆37Updated 2 years ago
- ☆18Updated 3 years ago
- Testing LLMs and RAG configurations at scale using an OpenAI Reflector☆11Updated 3 months ago
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆94Updated last week
- Enables synchronizing metadata changes (Create/Drop table/partition) from Hive Metastore to AWS Glue Data Catalog☆35Updated last year
- Spark stream from kafka(json) to s3(parquet)☆15Updated 6 years ago
- Stores Snowplow enriched events in Redshift, Snowflake and Databricks☆31Updated 2 weeks ago
- Aiven's S3 Sink Connector for Apache Kafka®☆69Updated 7 months ago
- Apiary provides modules which can be combined to create a federated cloud data lake☆36Updated last year
- Serverless proxy for Spark cluster☆325Updated 4 years ago
- ☆40Updated last year
- Demonstration of a Hive Input Format for Iceberg☆26Updated 4 years ago
- Multiple node presto cluster on docker container☆124Updated 2 years ago
- Ambari stack service for installing and managing Apache Airflow on HDP cluster☆59Updated 6 years ago
- Streaming ETL with Apache Flink and Amazon Kinesis Data Analytics☆64Updated last year
- Setup for running Trino with Hive Metastore on Kubernetes☆101Updated 2 years ago
- The Workload Analyzer collects Presto® and Trino workload statistics, and analyzes them☆135Updated last year