nebuly-ai / nosLinks

Module to Automatically maximize the utilization of GPU resources in a Kubernetes cluster through real-time dynamic partitioning and elastic quotas - Effortless optimization at its finest!

☆666

Alternatives and similar repositories for nos

Users that are interested in nos are comparing it to the libraries listed below

Sorting:

NVIDIA / KAI-Scheduler
KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale
☆707Updated last week
kserve / modelmesh-serving
Controller for ModelMesh
☆239Updated last month
kserve / modelmesh
Distributed Model Serving Framework
☆174Updated last month
run-ai / genv
GPU environment and cluster management with LLM support
☆621Updated last year
NVIDIA / k8s-dra-driver-gpu
NVIDIA DRA Driver for GPUs
☆400Updated last week
grgalex / nvshare
Practical GPU Sharing Without Memory Size Constraints
☆276Updated 4 months ago
kserve / website
User documentation for KServe.
☆106Updated 3 weeks ago
NVIDIA / gpu-feature-discovery
GPU plugin to the node feature discovery for Kubernetes
☆302Updated last year
kubernetes-sigs / jobset
JobSet: a k8s native API for distributed ML training and HPC workloads
☆246Updated this week
nebuly-ai / k8s-device-plugin
NVIDIA device plugin for Kubernetes
☆48Updated last year
kubernetes-sigs / lws
LeaderWorkerSet: An API for deploying a group of pods as a unit of replication
☆526Updated this week
deployKF / deployKF
deployKF builds machine learning platforms on Kubernetes. We combine the best of Kubeflow, Airflow†, and MLflow† into a complete platform…
☆441Updated 11 months ago
awslabs / aws-virtual-gpu-device-plugin
AWS virtual gpu device plugin provides capability to use smaller virtual gpus for your machine learning inference workloads
☆205Updated last year
kubernetes-sigs / gateway-api-inference-extension
Gateway API Inference Extension
☆415Updated this week
nebius / soperator
Run Slurm in Kubernetes
☆255Updated this week
kubeflow / mpi-operator
Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)
☆487Updated this week
coreweave / tensorizer
Module, Model, and Tensor Serialization/Deserialization
☆250Updated this week
elastic-ai / elastic-gpu-scheduler
elastic-gpu-scheduler is a Kubernetes scheduler extender for GPU resources scheduling.
☆142Updated 2 years ago
triton-inference-server / model_navigator
Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.
☆210Updated 3 months ago
armadaproject / armada
A multi-cluster batch queuing system for high-throughput workloads on Kubernetes.
☆538Updated this week
NVIDIA / mig-parted
MIG Partition Editor for NVIDIA GPUs
☆204Updated last week
NVIDIA / ais-k8s
Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.
☆106Updated last week
Deepomatic / shared-gpu-nvidia-k8s-device-plugin
Fork of NVIDIA device plugin for Kubernetes with support for shared GPUs by declaring GPUs multiple times
☆88Updated 3 years ago
kubeflow / manifests
Kubeflow Deployment Manifests
☆933Updated this week
triton-inference-server / model_analyzer
Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…
☆481Updated last week
kubeflow / model-registry
Model Registry provides a single pane of glass for ML model developers to index and manage models, versions, and ML artifacts metadata. I…
☆138Updated last week
kserve / open-inference-protocol
Repository for open inference protocol specification
☆59Updated 2 months ago
substratusai / kubeai
AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-te…
☆1,029Updated last week
datashim-io / datashim
A kubernetes based framework for hassle free handling of datasets
☆522Updated last week
pytorch / torchx
TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and sup…
☆378Updated this week