cnvrg / metagpu
K8s device plugin for GPU sharing
☆100Updated last year
Alternatives and similar repositories for metagpu:
Users that are interested in metagpu are comparing it to the libraries listed below
- JobSet: a k8s native API for distributed ML training and HPC workloads☆225Updated last week
- Example DRA driver that developers can fork and modify to get them started writing their own.☆69Updated last month
- knavigator is a development, testing, and optimization toolkit for AI/ML scheduling systems at scale on Kubernetes.☆66Updated this week
- ☆148Updated 3 weeks ago
- Automatic repair for unhealthy Kubernetes nodes☆51Updated 3 weeks ago
- KJob: Tool for CLI-loving ML researchers☆28Updated last week
- GenAI inference performance benchmarking tool☆41Updated this week
- Sidecar container that watches Kubernetes PersistentVolumeClaims objects and triggers controller side expansion operation against a CSI e…☆130Updated this week
- Kubernetes-in-Kubernetes Made Simple☆86Updated last year
- ☆51Updated last year
- K8s Node Health Check Operator☆107Updated 2 weeks ago
- Holistic job manager on Kubernetes☆115Updated last year
- CAPK is a provider for Cluster API (CAPI) that allows users to deploy fake, Kubemark-backed machines to their clusters.☆72Updated last month
- Operator for Multi-Cluster Monitoring with Thanos.☆132Updated this week
- Kubernetes Work API☆66Updated this week
- ☆52Updated 2 weeks ago
- Kubernetes Image Puller is used for caching images on a cluster. It creates a DaemonSet downloading and running the relevant container im…☆249Updated last week
- Smart Kubernetes Scheduling☆78Updated last week
- Libraries for implementing aggregated apiservers☆89Updated 3 weeks ago
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆100Updated this week
- WG Serving☆24Updated 3 weeks ago
- Dynamic Resource Allocation (DRA) for NVIDIA GPUs in Kubernetes☆352Updated last week
- New generation community-driven etcd-operator!☆114Updated this week
- Container Object Storage Interface (COSI) controller responsible to manage lifecycle of COSI objects. NOTE: The content of this repo has …☆95Updated 5 months ago
- CAAPH uses Helm charts to manage the installation and lifecycle of Cluster API add-ons.☆145Updated last week
- This repository hosts the Multi-Cluster Service APIs. Providers can import packages in this repo to ensure their multi-cluster service co…☆233Updated 2 months ago
- Kubernetes ClusterInventory API☆67Updated last month
- Provides a general service to support image acceleration based on kinds of accelerator like Nydus and eStargz etc.☆86Updated last month
- Operator for managing Node Feature Discovery deployment☆69Updated last month
- mck8s: Orchestration platform for multi-cluster k8s environments☆73Updated last year