BGBiao / gpu-monitoring-toolsLinks

Tools for monitoring NVIDIA GPUs on Linux

☆9

Alternatives and similar repositories for gpu-monitoring-tools

Users that are interested in gpu-monitoring-tools are comparing it to the libraries listed below

Sorting:

tkestack / gpu-admission
☆132Updated 4 years ago
AliyunContainerService / et-operator
Kubernetes Operator for AI and Bigdata Elastic Training
☆87Updated 6 months ago
PaddleFlow / paddle-operator
Elastic Deep Learning Training based on Kubernetes by Leveraging EDL and Volcano
☆32Updated 2 years ago
Mellanox / k8s-rdma-sriov-dev-plugin
Kubernetes Rdma SRIOV device plugin
☆111Updated 4 years ago
pokerfaceSad / GPUMounter
A kubernetes plugin which enables dynamically add or remove GPU resources for a running Pod
☆127Updated 3 years ago
gpucloud / k8s-device-plugin
NVIDIA device plugin for Kubernetes
☆15Updated 5 years ago
hustcat / k8s-rdma-device-plugin
RDMA device plugin for Kubernetes
☆217Updated last year
virtaitech / orion
☆277Updated 2 years ago
tkestack / go-nvml
☆32Updated 4 years ago
NVIDIA / go-gpuallocator
Go Abstraction for Allocating NVIDIA GPUs with Custom Policies
☆116Updated this week
volcano-sh / resource-exporter
Resource Exporter for volcano scheduling, e.g. NUMA-Aware scheduling.
☆17Updated 2 months ago
kubeflow / caffe2-operator
Experimental repository for a caffe2 operator
☆16Updated 3 years ago
kubedl-io / morphling
Automatic tuning for ML model deployment on Kubernetes
☆80Updated 9 months ago
kleveross / ftlib
Fault-tolerant for DL frameworks
☆70Updated 2 years ago
baidu / paddle-on-k8s-operator
Kubernetes operator for managing the lifecycle of PaddlePaddle job.
☆24Updated 5 years ago
volcano-sh / devices
Device plugins for Volcano, e.g. GPU
☆126Updated 4 months ago
chanjarster / kubebuilder-mix-codegen-how-to
Mix kubebuilder and code-generator example
☆23Updated 5 years ago
Project-HAMi / volcano-vgpu-device-plugin
Device-plugin for volcano vgpu which support hard resource isolation
☆98Updated last month
Mellanox / k8s-rdma-shared-dev-plugin
☆283Updated this week
elastic-ai / elastic-gpu-scheduler
elastic-gpu-scheduler is a Kubernetes scheduler extender for GPU resources scheduling.
☆142Updated 2 years ago
kubeflow / mxnet-operator
A Kubernetes operator for mxnet jobs
☆53Updated 3 years ago
Qihoo360 / dgl-operator
The DGL Operator makes it easy to run Deep Graph Library (DGL) graph neural network training on Kubernetes
☆44Updated 3 years ago
vmware-archive / bitfusion-with-kubernetes-integration
Bitfusion with Kubernetes Integration Support
☆50Updated last year
kubeflow / common
Common APIs and libraries shared by other Kubeflow operator repositories.
☆52Updated 2 years ago
elastic-ai / elastic-gpu
Using CRDs to manage GPU resources in Kubernetes.
☆207Updated 2 years ago
Mr-Linus / Yoda-Scheduler
Yoda is a kubernetes scheduler based on GPU metrics. Yoda是一个基于GPU参数指标的 Kubernetes 调度器
☆139Updated 3 years ago
Mellanox / sriov-cni
DPDK & SR-IOV CNI plugin
☆19Updated 2 weeks ago
Project-HAMi / HAMi-core
HAMi-core compiles libvgpu.so, which ensures hard limit on GPU in container
☆195Updated last week
NVIDIA / go-dcgm
Golang bindings for Nvidia Datacenter GPU Manager (DCGM)
☆125Updated last week
kleveross / klever-model-registry
Cloud Native Machine Learning Model Registry
☆81Updated 2 years ago