ROCm / device-metrics-exporterLinks
Device Metrics Exporter exports metrics from AMD devices (GPUs) to collectors like Prometheus.
☆39Updated 2 weeks ago
Alternatives and similar repositories for device-metrics-exporter
Users that are interested in device-metrics-exporter are comparing it to the libraries listed below
Sorting:
- The AMD SMI Exporter exports AMD EPYC CPU & Datacenter GPU metrics to the Prometheus server.☆64Updated 7 months ago
- The NVIDIA Driver Manager is a Kubernetes component which assist in seamless upgrades of NVIDIA Driver on each node of the cluster.☆46Updated this week
- Run Slurm on Kubernetes. A Slinky project.☆213Updated last week
- Carbon Limiting Auto Tuning for Kubernetes☆37Updated last year
- NVIDIA vGPU Device Manager manages NVIDIA vGPU devices on top of Kubernetes☆152Updated this week
- Documentation repository for NVIDIA Cloud Native Technologies☆34Updated this week
- Deploy a Flux MiniCluster to Kubernetes with the operator☆38Updated last month
- The NVIDIA GPU driver container allows the provisioning of the NVIDIA driver through the use of containers.☆150Updated this week
- Provides a general service to support image acceleration based on kinds of accelerator like Nydus and eStargz etc.☆94Updated 2 months ago
- ☆40Updated 3 weeks ago
- KJob: Tool for CLI-loving ML researchers☆40Updated last week
- OpenAPI Golang client library for Slurm REST API. A Slinky project.☆20Updated 3 weeks ago
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆140Updated 3 weeks ago
- ☆91Updated this week
- ☆275Updated 3 weeks ago
- Linux Traffic Control (TC) based implementation of Kubernetes NPWG MultiNetworkPolicy API☆12Updated 2 years ago
- A Slurm cluster for Kubernetes☆66Updated last year
- Prometheus exporter for a Infiniband Fabric☆68Updated 2 years ago
- RDMA CNI plugin for containerized workloads☆58Updated 2 weeks ago
- ☆87Updated last year
- ☆70Updated 2 weeks ago
- A storage plugin that provided CRI-O/Podman with the ability to lazy mount nydus images.☆40Updated 7 months ago
- Practical GPU Sharing Without Memory Size Constraints☆296Updated 9 months ago
- Manages Highly-Available iSCSI targets, NVMe-oF targets, and NFS exports via LINSTOR☆45Updated last month
- API for coordinating Maintenance in Kubernetes.☆26Updated 5 months ago
- ☆23Updated 3 weeks ago
- A collection of useful Go libraries to ease the development of NVIDIA Operators for GPU/NIC management.☆26Updated last month
- IP Over Infiniband (IPoIB) CNI Plugin☆16Updated 2 weeks ago
- A Lustre container storage interface that allows Kubernetes to mount/unmount provisioned Lustre filesystems into containers.☆44Updated 3 weeks ago
- A toolkit for discovering cluster network topology.☆89Updated 3 weeks ago