kubeflow / spark-operator
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
☆2,744Updated this week
Related projects: ⓘ
- [DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.☆658Updated 2 years ago
- Apache Flink Kubernetes Operator☆785Updated last week
- Kubernetes operator that provides control plane for managing Apache Flink applications☆562Updated 3 weeks ago
- Apache YuniKorn Core☆819Updated this week
- Cruise-control is the first of its kind to fully automate the dynamic workload rebalance and self-healing of a Kafka cluster. It provides…☆2,734Updated last week
- The User-Community Airflow Helm Chart is the standard way to deploy Apache Airflow on Kubernetes with Helm. Originally created in 2017, i…☆646Updated 2 months ago
- Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the ku…☆612Updated 4 years ago
- Apache Iceberg☆6,161Updated this week
- Repository holding configuration files for running an HDFS cluster in Kubernetes☆398Updated last year
- Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.☆880Updated this week
- Upserts, Deletes And Incremental Processing on Big Data.☆5,330Updated this week
- Confluent Schema Registry for Kafka☆2,194Updated this week
- Apache Spark docker image☆2,034Updated last year
- Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.☆2,071Updated this week
- The Confluent Platform Helm charts enable you to deploy Confluent Platform services on Kubernetes for development, test, and proof of con…☆787Updated 7 months ago
- Altinity Kubernetes Operator for ClickHouse creates, configures and manages ClickHouse clusters running on Kubernetes☆1,862Updated this week
- Kafka cluster as Kubernetes StatefulSet, plain manifests and config☆1,837Updated 3 months ago
- A Cloud Native Batch System (Project under CNCF)☆4,071Updated this week
- Pravega - Streaming as a new software defined storage primitive☆1,982Updated 2 weeks ago
- ☆1,606Updated this week
- Apache Flink Stateful Functions☆504Updated 4 months ago
- A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, orga…☆2,214Updated this week
- Kafka Consumer Lag Checking☆3,725Updated last month
- Event-driven Automation Framework for Kubernetes☆2,336Updated this week
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.☆3,252Updated last week
- Apache Atlas☆1,813Updated 2 weeks ago
- An extensible distributed system for reliable nearline data streaming at scale☆911Updated 3 months ago
- Nessie: Transactional Catalog for Data Lakes with Git-like semantics☆984Updated this week
- An Open Standard for lineage metadata collection☆1,708Updated this week
- Collect, aggregate, and visualize a data ecosystem's metadata☆1,732Updated this week