ROCm / device-metrics-exporterLinks
Device Metrics Exporter exports metrics from AMD devices (GPUs) to collectors like Prometheus.
☆28Updated this week
Alternatives and similar repositories for device-metrics-exporter
Users that are interested in device-metrics-exporter are comparing it to the libraries listed below
Sorting:
- ☆40Updated 3 weeks ago
- Carbon Limiting Auto Tuning for Kubernetes☆37Updated 11 months ago
- ☆87Updated last year
- NVIDIA vGPU Device Manager manages NVIDIA vGPU devices on top of Kubernetes☆142Updated this week
- InterLink aims to provide an abstraction for the execution of a Kubernetes pod on any remote resource capable of managing a Container exe…☆90Updated this week
- ☆74Updated this week
- Documentation repository for NVIDIA Cloud Native Technologies☆29Updated this week
- Run Slurm on Kubernetes. A Slinky project.☆173Updated this week
- The NVIDIA Driver Manager is a Kubernetes component which assist in seamless upgrades of NVIDIA Driver on each node of the cluster.☆38Updated this week
- Cloud Native Benchmarking of Foundation Models☆42Updated 2 months ago
- The AMD SMI Exporter exports AMD EPYC CPU & Datacenter GPU metrics to the Prometheus server.☆58Updated 4 months ago
- Network observability for Kubernetes☆45Updated last week
- Manages Highly-Available iSCSI targets, NVMe-oF targets, and NFS exports via LINSTOR☆42Updated 3 months ago
- KJob: Tool for CLI-loving ML researchers☆39Updated this week
- WekaFS Container Storage Interface (CSI) Plugin☆50Updated this week
- Kubernetes Cluster API Provider Virtink☆26Updated 2 years ago
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆130Updated this week
- Deploy a Flux MiniCluster to Kubernetes with the operator☆35Updated last week
- Slurm in Kubernetes☆43Updated last month
- Prometheus exporter for a Infiniband Fabric☆67Updated last year
- Nvidia-smi Prometheus exporter with respecting of GPU-UUID☆37Updated 2 years ago
- Node feature discovery, detects the available hardware features and configuration in a cluster.☆15Updated last week
- Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.☆110Updated last week
- llm-d benchmark scripts and tooling☆30Updated this week
- Practical GPU Sharing Without Memory Size Constraints☆287Updated 6 months ago
- ☆38Updated this week
- Cluster API implementation for Incus and LXD☆53Updated 2 weeks ago
- Model Server for Kepler☆28Updated 2 months ago
- Provides a general service to support image acceleration based on kinds of accelerator like Nydus and eStargz etc.☆94Updated 2 weeks ago
- Monitors DRBD resources via plugins.☆40Updated last month