NVIDIA / KAI-SchedulerLinks

KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale

☆869

Alternatives and similar repositories for KAI-Scheduler

Users that are interested in KAI-Scheduler are comparing it to the libraries listed below

Sorting:

NVIDIA / k8s-dra-driver-gpu
NVIDIA DRA Driver for GPUs
☆458Updated last week
kubernetes-sigs / gateway-api-inference-extension
Gateway API Inference Extension
☆501Updated this week
kubernetes-sigs / lws
LeaderWorkerSet: An API for deploying a group of pods as a unit of replication
☆601Updated this week
kubernetes-sigs / jobset
JobSet: a k8s native API for distributed ML training and HPC workloads
☆268Updated last week
NVIDIA / grove
Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling
☆71Updated 2 weeks ago
kubernetes-sigs / inference-perf
GenAI inference performance benchmarking tool
☆106Updated last week
NVIDIA / topograph
A toolkit for discovering cluster network topology.
☆74Updated this week
NVIDIA / k8s-nim-operator
An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.
☆130Updated this week
NVIDIA / mig-parted
MIG Partition Editor for NVIDIA GPUs
☆218Updated last week
llm-d / llm-d
Achieve state of the art inference performance with modern accelerators on Kubernetes
☆1,907Updated this week
nebius / soperator
Run Slurm in Kubernetes
☆300Updated this week
kubeflow / mpi-operator
Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)
☆499Updated last week
volcano-sh / volcano-global
A federation scheduler for multi-cluster
☆54Updated 4 months ago
run-ai / fake-gpu-operator
☆152Updated this week
kserve / modelmesh-serving
Controller for ModelMesh
☆237Updated 4 months ago
kubernetes-sigs / kueue
Kubernetes-native Job Queueing
☆2,031Updated this week
InftyAI / llmaz
☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
☆261Updated last week
kubeflow / model-registry
Model Registry provides a single pane of glass for ML model developers to index and manage models, versions, and ML artifacts metadata. I…
☆150Updated last week
awslabs / aws-virtual-gpu-device-plugin
AWS virtual gpu device plugin provides capability to use smaller virtual gpus for your machine learning inference workloads
☆205Updated last year
nebuly-ai / nos
Module to Automatically maximize the utilization of GPU resources in a Kubernetes cluster through real-time dynamic partitioning and elas…
☆672Updated last year
llm-d-incubation / llm-d-infra
llm-d helm charts and deployment examples
☆45Updated 3 weeks ago
Mellanox / k8s-rdma-shared-dev-plugin
☆303Updated this week
nebuly-ai / k8s-device-plugin
NVIDIA device plugin for Kubernetes
☆48Updated last year
NVIDIA / gpu-feature-discovery
GPU plugin to the node feature discovery for Kubernetes
☆305Updated last year
sgl-project / ome
OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)
☆292Updated last week
Project-HAMi / HAMi-core
HAMi-core compiles libvgpu.so, which ensures hard limit on GPU in container
☆238Updated 2 weeks ago
armadaproject / armada
A multi-cluster batch queuing system for high-throughput workloads on Kubernetes.
☆560Updated this week
volcano-sh / devices
Device plugins for Volcano, e.g. GPU
☆129Updated 7 months ago
NVIDIA / nvkind
☆175Updated last week
llm-d / llm-d-inference-scheduler
Inference scheduler for llm-d
☆99Updated last week