NVIDIA / KAI-Scheduler
KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale
☆549Updated this week
Alternatives and similar repositories for KAI-Scheduler
Users that are interested in KAI-Scheduler are comparing it to the libraries listed below
Sorting:
- LeaderWorkerSet: An API for deploying a group of pods as a unit of replication☆428Updated 2 weeks ago
- Gateway API Inference Extension☆272Updated this week
- Dynamic Resource Allocation (DRA) for NVIDIA GPUs in Kubernetes☆355Updated this week
- JobSet: a k8s native API for distributed ML training and HPC workloads☆226Updated this week
- Envoy AI Gateway is an open source project for using Envoy Gateway to handle request traffic from application clients to Generative AI se…☆246Updated this week
- Run Slurm in Kubernetes☆221Updated this week
- Controller for ModelMesh☆229Updated this week
- A toolkit for discovering cluster network topology.☆46Updated 2 weeks ago
- ☆110Updated last week
- GPU plugin to the node feature discovery for Kubernetes☆300Updated 11 months ago
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆103Updated this week
- ☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!☆167Updated this week
- Kubernetes-native Job Queueing☆1,770Updated this week
- NVIDIA Network Operator☆248Updated this week
- ☆150Updated 3 weeks ago
- K8s device plugin for GPU sharing☆100Updated 2 years ago
- MIG Partition Editor for NVIDIA GPUs☆198Updated last week
- Device plugins for Volcano, e.g. GPU☆119Updated last month
- elastic-gpu-scheduler is a Kubernetes scheduler extender for GPU resources scheduling.☆141Updated 2 years ago
- AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-te…☆938Updated last week
- knavigator is a development, testing, and optimization toolkit for AI/ML scheduling systems at scale on Kubernetes.☆66Updated last week
- ☆251Updated last week
- A federation scheduler for multi-cluster☆39Updated 2 months ago
- ☆207Updated last week
- GenAI inference performance benchmarking tool☆41Updated this week
- CUDA checkpoint and restore utility☆333Updated 3 months ago
- A curated list of awesome projects and resources related to Kubeflow (a CNCF incubating project)☆209Updated 2 weeks ago
- AWS virtual gpu device plugin provides capability to use smaller virtual gpus for your machine learning inference workloads☆204Updated last year
- Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)☆475Updated 3 weeks ago
- Share GPU between Pods in Kubernetes☆209Updated 2 years ago