kubeflow/spark-operator

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kubeflow/spark-operator)

kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

☆3,142

Alternatives and similar repositories for spark-operator

Users that are interested in spark-operator are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

volcano-sh / volcano
View on GitHub
A Cloud Native Batch System (Project under CNCF)
☆5,796Updated this week
GoogleCloudPlatform / flink-on-k8s-operator
View on GitHub
[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
☆654Sep 2, 2022Updated 3 years ago
apache / spark-kubernetes-operator
View on GitHub
Apache Spark Kubernetes Operator
☆302Updated this week
apache / yunikorn-core
View on GitHub
Apache YuniKorn Core
☆1,021Updated this week
delta-io / delta
View on GitHub
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Tr…
☆8,924Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
apache / kyuubi
View on GitHub
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
☆2,353Updated this week
kubeflow / kubeflow
View on GitHub
Machine Learning Toolkit for Kubernetes
☆15,788Jul 10, 2026Updated last week
kubeflow / trainer
View on GitHub
Distributed AI Model Training and LLM Fine-Tuning on Kubernetes
☆2,152Updated this week
kubernetes-retired / kube-batch
View on GitHub
A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC
☆1,089May 22, 2023Updated 3 years ago
palantir / k8s-spark-scheduler
View on GitHub
A Kubernetes Scheduler Extender to provide gang scheduling support for Spark on Kubernetes
☆179Apr 23, 2023Updated 3 years ago
apache-spark-on-k8s / spark
View on GitHub
Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the ku…
☆610Jan 8, 2020Updated 6 years ago
radanalyticsio / spark-operator
View on GitHub
Operator for managing the Spark clusters on Kubernetes and OpenShift.
☆159Nov 18, 2021Updated 4 years ago
apache / flink-kubernetes-operator
View on GitHub
Apache Flink Kubernetes Operator
☆1,021Updated this week
argoproj / argo-workflows
View on GitHub
Workflow Engine for Kubernetes
☆16,839Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
apache / iceberg
View on GitHub
Apache Iceberg
☆9,070Updated this week
apache / hudi
View on GitHub
Upserts, Deletes And Incremental Processing on Big Data.
☆6,192Updated this week
strimzi / strimzi-kafka-operator
View on GitHub
Apache Kafka® running on Kubernetes
☆5,884Updated this week
apache-spark-on-k8s / kubernetes-HDFS
View on GitHub
Repository holding configuration files for running an HDFS cluster in Kubernetes
☆397Sep 25, 2024Updated last year
JahstreetOrg / spark-on-kubernetes-helm
View on GitHub
Spark on Kubernetes infrastructure Helm charts repo
☆202Oct 20, 2022Updated 3 years ago
apache / celeborn
View on GitHub
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
☆1,058Jul 16, 2026Updated last week
apache / spark
View on GitHub
Apache Spark - A unified analytics engine for large-scale data processing
☆43,670Updated this week
lyft / flinkk8soperator
View on GitHub
Kubernetes operator that provides control plane for managing Apache Flink applications
☆581Aug 6, 2025Updated 11 months ago
apache / gluten
View on GitHub
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
☆1,576Updated this week
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
datamechanics / delight
View on GitHub
A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.
☆345May 31, 2024Updated 2 years ago
trinodb / trino
View on GitHub
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
☆13,061Updated this week
prometheus-operator / prometheus-operator
View on GitHub
Prometheus Operator creates/configures/manages Prometheus clusters atop Kubernetes
☆9,965Updated this week
kubernetes-sigs / kubebuilder
View on GitHub
Kubebuilder - SDK for building Kubernetes APIs using CRDs
☆9,254Updated this week
uber / RemoteShuffleService
View on GitHub
Remote shuffle service for Apache Spark to store shuffle data on remote servers.
☆335Sep 29, 2023Updated 2 years ago
kubernetes / autoscaler
View on GitHub
Autoscaling components for Kubernetes
☆8,910Updated this week
aws-samples / eks-spark-benchmark
View on GitHub
Performance optimization for Spark running on Kubernetes
☆87Aug 18, 2020Updated 5 years ago
operator-framework / operator-sdk
View on GitHub
SDK for building Kubernetes applications. Provides high level APIs, useful abstractions, and project scaffolding.
☆7,664Jul 16, 2026Updated last week
Alluxio / alluxio
View on GitHub
Alluxio, data orchestration for analytics and machine learning in the cloud
☆7,213Apr 29, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
apple / batch-processing-gateway
View on GitHub
The gateway component to make Spark on K8s much easier for Spark users.
☆221May 6, 2026Updated 2 months ago
kubernetes-sigs / kustomize
View on GitHub
Customization of kubernetes YAML configurations
☆12,113Updated this week
kubernetes-sigs / scheduler-plugins
View on GitHub
Repository for out-of-tree scheduler plugins based on scheduler framework.
☆1,305Jul 9, 2026Updated 2 weeks ago
kedacore / keda
View on GitHub
KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes
☆10,382Updated this week
apache / livy
View on GitHub
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
☆958Jul 9, 2026Updated 2 weeks ago
GoogleCloudPlatform / airflow-operator
View on GitHub
Kubernetes custom controller and CRDs to managing Airflow
☆297Jun 25, 2020Updated 6 years ago
awslabs / deequ
View on GitHub
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
☆3,636Updated this week