cnvrg / metagpu
K8s device plugin for GPU sharing
☆100Updated last year
Alternatives and similar repositories for metagpu:
Users that are interested in metagpu are comparing it to the libraries listed below
- JobSet: a k8s native API for distributed ML training and HPC workloads☆218Updated this week
- Example DRA driver that developers can fork and modify to get them started writing their own.☆69Updated 3 weeks ago
- Operator for Multi-Cluster Monitoring with Thanos.☆132Updated this week
- Automatic repair for unhealthy Kubernetes nodes☆50Updated last month
- Kubernetes-in-Kubernetes Made Simple☆86Updated last year
- knavigator is a development, testing, and optimization toolkit for AI/ML scheduling systems at scale on Kubernetes.☆64Updated 3 weeks ago
- CAPK is a provider for Cluster API (CAPI) that allows users to deploy fake, Kubemark-backed machines to their clusters.☆72Updated 2 weeks ago
- K8s Node Health Check Operator☆107Updated last month
- elastic-gpu-scheduler is a Kubernetes scheduler extender for GPU resources scheduling.☆140Updated 2 years ago
- KJob: Tool for CLI-loving ML researchers☆26Updated this week
- Smart Kubernetes Scheduling☆78Updated this week
- Sidecar container that watches Kubernetes PersistentVolumeClaims objects and triggers controller side expansion operation against a CSI e…☆130Updated this week
- ☆125Updated this week
- ☆100Updated 3 weeks ago
- Manage kubernetes node-level kernel tuning ( using sysctl ).☆28Updated last month
- GenAI inference performance benchmarking tool☆36Updated 2 weeks ago
- A collection of community maintained NRI plugins☆79Updated this week
- LeaderWorkerSet: An API for deploying a group of pods as a unit of replication☆397Updated this week
- ☆51Updated last year
- Kubernetes Image Puller is used for caching images on a cluster. It creates a DaemonSet downloading and running the relevant container im…☆248Updated 3 weeks ago
- An etcd operator to configure, provision, reconcile and monitor etcd clusters.☆85Updated this week
- The official Kubernetes operator for etcd.☆57Updated last week
- This repo contains sidecar controller and agent for volume health monitoring.☆67Updated 2 weeks ago
- Declarative node network configuration driven through Kubernetes API.☆218Updated this week
- Manage admission policies in your Kubernetes cluster with ease☆207Updated this week
- Provides a general service to support image acceleration based on kinds of accelerator like Nydus and eStargz etc.☆85Updated 3 weeks ago
- [EOL] Reworking kube-proxy's architecture☆246Updated 9 months ago
- CAAPH uses Helm charts to manage the installation and lifecycle of Cluster API add-ons.☆143Updated last week
- command line tool to bootstrap open-cluster-management control plane.☆88Updated last week
- ☆83Updated last month