kai-scheduler/KAI-Scheduler

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kai-scheduler/KAI-Scheduler)

kai-scheduler / KAI-Scheduler

KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale

☆1,267

Alternatives and similar repositories for KAI-Scheduler

Users that are interested in KAI-Scheduler are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kubernetes-sigs / lws
View on GitHub
LeaderWorkerSet: An API for deploying a group of pods as a unit of replication
☆722May 9, 2026Updated last week
Project-HAMi / HAMi
View on GitHub
Heterogeneous GPU Sharing on Kubernetes
☆3,451Updated this week
kubernetes-sigs / dra-driver-nvidia-gpu
View on GitHub
DRA Driver for NVIDIA GPUs
☆643May 13, 2026Updated last week
NVIDIA / gpu-operator
View on GitHub
NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes
☆2,701Updated this week
ai-dynamo / grove
View on GitHub
Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling
☆205May 12, 2026Updated last week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
kubernetes-sigs / kueue
View on GitHub
Kubernetes-native Job Queueing
☆2,505Updated this week
run-ai / runai-model-streamer
View on GitHub
☆302Apr 30, 2026Updated 2 weeks ago
run-ai / genv
View on GitHub
GPU environment and cluster management with LLM support
☆660May 16, 2024Updated 2 years ago
run-ai / fake-gpu-operator
View on GitHub
☆256Updated this week
ai-dynamo / dynamo
View on GitHub
A Datacenter Scale Distributed Inference Serving Framework
☆6,791Updated this week
volcano-sh / volcano
View on GitHub
A Cloud Native Batch System (Project under CNCF)
☆5,570May 13, 2026Updated last week
runai-professional-services / runapy
View on GitHub
Python client for the Run:ai REST API
☆25Dec 15, 2025Updated 5 months ago
ray-project / kuberay
View on GitHub
A toolkit to run Ray applications on Kubernetes
☆2,499Updated this week
cncf-tags / container-device-interface
View on GitHub
☆298Apr 16, 2026Updated last month
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Mellanox / k8s-rdma-shared-dev-plugin
View on GitHub
☆357May 13, 2026Updated last week
NVIDIA / k8s-nim-operator
View on GitHub
An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.
☆158Updated this week
run-ai / docs
View on GitHub
markdown docs
☆96Feb 1, 2026Updated 3 months ago
run-ai / rntop
View on GitHub
A top-like tool for monitoring GPUs in a cluster
☆85Feb 14, 2024Updated 2 years ago
kubernetes-sigs / inference-perf
View on GitHub
GenAI inference performance benchmarking tool
☆188Updated this week
kubernetes-sigs / jobset
View on GitHub
JobSet: a k8s native API for distributed ML training and HPC workloads
☆324May 11, 2026Updated last week
kubewharf / godel-scheduler
View on GitHub
a unified scheduler for online and offline tasks
☆669Mar 2, 2026Updated 2 months ago
Mellanox / network-operator
View on GitHub
NVIDIA Network Operator
☆337Updated this week
llm-d / llm-d
View on GitHub
Achieve state of the art inference performance with modern accelerators on Kubernetes
☆3,184Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
kubernetes-sigs / gateway-api-inference-extension
View on GitHub
Gateway API Inference Extension
☆669Updated this week
volcano-sh / kthena
View on GitHub
Kubernetes-native AI serving platform for scalable model serving.
☆343Updated this week
kubernetes-sigs / scheduler-plugins
View on GitHub
Repository for out-of-tree scheduler plugins based on scheduler framework.
☆1,293May 11, 2026Updated last week
NVIDIA / mig-parted
View on GitHub
MIG Partition Editor for NVIDIA GPUs
☆252Updated this week
sgl-project / rbg
View on GitHub
A workload for deploying LLM inference services on Kubernetes
☆215May 9, 2026Updated last week
NVIDIA / topograph
View on GitHub
A toolkit for discovering cluster network topology.
☆130Updated this week
apache / yunikorn-core
View on GitHub
Apache YuniKorn Core
☆1,010Apr 30, 2026Updated 2 weeks ago
NVIDIA / k8s-device-plugin
View on GitHub
NVIDIA device plugin for Kubernetes
☆3,755Updated this week
ome-projects / ome
View on GitHub
Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, T…
☆450Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
NVIDIA / dcgm-exporter
View on GitHub
NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
☆1,727May 12, 2026Updated last week
kserve / kserve
View on GitHub
Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes
☆5,489Updated this week
NVIDIA / knavigator
View on GitHub
knavigator is a development, testing, and optimization toolkit for AI/ML scheduling systems at scale on Kubernetes.
☆78Apr 14, 2026Updated last month
vllm-project / aibrix
View on GitHub
Cost-efficient and pluggable Infrastructure components for GenAI inference
☆4,806May 13, 2026Updated last week
containerd / nri
View on GitHub
Node Resource Interface
☆382Apr 28, 2026Updated 3 weeks ago
Project-HAMi / HAMi-core
View on GitHub
HAMi-core compiles libvgpu.so, which ensures hard limit on GPU in container
☆305May 9, 2026Updated last week
koordinator-sh / koordinator
View on GitHub
A QoS-based scheduling system brings optimal layout and status to workloads such as microservices, web services, big data jobs, AI jobs, …
☆1,683May 13, 2026Updated last week