Module to Automatically maximize the utilization of GPU resources in a Kubernetes cluster through real-time dynamic partitioning and elastic quotas - Effortless optimization at its finest!
☆685Apr 21, 2024Updated last year
Alternatives and similar repositories for nos
Users that are interested in nos are comparing it to the libraries listed below
Sorting:
- NVIDIA device plugin for Kubernetes☆49Feb 16, 2024Updated 2 years ago
- NVIDIA DRA Driver for GPUs☆585Updated this week
- NVIDIA device plugin for Kubernetes☆3,699Mar 13, 2026Updated last week
- GPU Sharing Scheduler for Kubernetes Cluster☆1,530Dec 29, 2023Updated 2 years ago
- A collection of libraries to optimise AI model performances☆8,350Jul 22, 2024Updated last year
- Kubernetes-native Job Queueing☆2,368Updated this week
- Heterogeneous GPU Sharing on Kubernetes☆3,110Updated this week
- NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes☆2,590Updated this week
- AWS virtual gpu device plugin provides capability to use smaller virtual gpus for your machine learning inference workloads☆203Nov 22, 2023Updated 2 years ago
- A Cloud Native Batch System (Project under CNCF)☆5,381Mar 11, 2026Updated last week
- Run Slurm in Kubernetes☆368Updated this week
- Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes☆5,216Updated this week
- A kubernetes operator for creating and managing a cache of container images directly on the cluster worker nodes, so application pods sta…☆1,369Feb 20, 2024Updated 2 years ago
- MIG Partition Editor for NVIDIA GPUs☆244Updated this week
- KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale☆1,181Updated this week
- Kubectl Sockperf plugin - Latency Measurement in Kubernetes☆21Nov 26, 2022Updated 3 years ago
- JobSet: a k8s native API for distributed ML training and HPC workloads☆318Mar 13, 2026Updated last week
- AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-te…☆1,165Feb 23, 2026Updated 3 weeks ago
- A Kubernetes plugin that gives context to what is restarting in your Kubernetes cluster☆155Sep 10, 2025Updated 6 months ago
- Repository for out-of-tree scheduler plugins based on scheduler framework.☆1,281Updated this week
- Kpad is a simple multiplatform terminal editor born to edit kubernetes declarative manifest yaml files.☆45Oct 10, 2023Updated 2 years ago
- Practical GPU Sharing Without Memory Size Constraints☆306Mar 28, 2025Updated 11 months ago
- Multi-tenancy and policy-based framework for Kubernetes.☆2,046Updated this week
- Automatically taint nodes and evict pods based on cpu pressure☆51Dec 23, 2022Updated 3 years ago
- Multi-cluster Kubernetes usage analytics for CPU, Memory, and GPU — track costs and optimize cluster resources☆64Mar 18, 2025Updated last year
- Resource-adaptive cluster scheduler for deep learning training.☆453Mar 5, 2023Updated 3 years ago
- Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)☆519Updated this week
- Enable dynamic and seamless Kubernetes multi-cluster topologies☆1,411Mar 10, 2026Updated last week
- ☆893Apr 2, 2024Updated last year
- knavigator is a development, testing, and optimization toolkit for AI/ML scheduling systems at scale on Kubernetes.☆76Jul 18, 2025Updated 8 months ago
- A Kubernetes controller for automatically optimizing pod requests based on their continuous usage. VPA alternative that can work with HPA…☆206Feb 9, 2024Updated 2 years ago
- K8s device plugin for GPU sharing☆100May 10, 2023Updated 2 years ago
- ☆14Jan 11, 2023Updated 3 years ago
- Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration☆5,336Updated this week
- A light library to allow changing pod log level without restarting the pod☆12Jul 29, 2023Updated 2 years ago
- Cost monitoring for Kubernetes workloads and cloud costs☆6,419Mar 13, 2026Updated last week
- A Topology-Aware Custom Scheduler For Kubernetes☆65Jul 5, 2023Updated 2 years ago
- LeaderWorkerSet: An API for deploying a group of pods as a unit of replication☆682Updated this week
- Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, 20+ clouds, o…☆9,576Mar 15, 2026Updated last week