NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
☆1,626Feb 25, 2026Updated last week
Alternatives and similar repositories for dcgm-exporter
Users that are interested in dcgm-exporter are comparing it to the libraries listed below
Sorting:
- Nvidia GPU exporter for prometheus using nvidia-smi binary☆1,418Updated this week
- NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs☆671Feb 17, 2026Updated 2 weeks ago
- NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes☆2,572Updated this week
- Golang bindings for Nvidia Datacenter GPU Manager (DCGM)☆149Feb 14, 2026Updated 2 weeks ago
- NVIDIA device plugin for Kubernetes☆3,679Updated this week
- Heterogeneous AI Computing Virtualization Middleware(Project under CNCF)☆3,047Updated this week
- GPU plugin to the node feature discovery for Kubernetes☆307May 27, 2024Updated last year
- Go Bindings for the NVIDIA Management Library (NVML)☆425Feb 12, 2026Updated 2 weeks ago
- ☆337Feb 22, 2026Updated last week
- A toolkit to run Ray applications on Kubernetes☆2,355Updated this week
- Tools for monitoring NVIDIA GPUs on Linux☆1,067Nov 2, 2021Updated 4 years ago
- Exporter for machine metrics☆13,186Updated this week
- A Cloud Native Batch System (Project under CNCF)☆5,352Updated this week
- NVIDIA DRA Driver for GPUs☆574Updated this week
- GPU Sharing Scheduler for Kubernetes Cluster☆1,528Dec 29, 2023Updated 2 years ago
- LeaderWorkerSet: An API for deploying a group of pods as a unit of replication☆673Updated this week
- This is a place for various problem detectors running on the Kubernetes nodes.☆3,347Feb 23, 2026Updated last week
- Build and run containers leveraging NVIDIA GPUs☆4,088Feb 25, 2026Updated last week
- Node feature discovery for Kubernetes☆1,003Updated this week
- NVIDIA k8s device plugin for Kubevirt☆278Updated this week
- Add-on agent to generate and expose cluster-level metrics.☆6,068Feb 23, 2026Updated last week
- Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes☆5,135Updated this week
- MIG Partition Editor for NVIDIA GPUs☆241Feb 25, 2026Updated last week
- Kubernetes-native Job Queueing☆2,329Updated this week
- NVIDIA container runtime library☆1,072Updated this week
- The NVIDIA Driver Manager is a Kubernetes component which assist in seamless upgrades of NVIDIA Driver on each node of the cluster.☆49Feb 23, 2026Updated last week
- NVIDIA Network Operator☆325Updated this week
- ☆892Apr 2, 2024Updated last year
- KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale☆1,144Updated this week
- HAMi-core compiles libvgpu.so, which ensures hard limit on GPU in container☆278Feb 25, 2026Updated last week
- A tool for bandwidth measurements on NVIDIA GPUs.☆631Apr 15, 2025Updated 10 months ago
- NCCL Tests☆1,446Feb 9, 2026Updated 3 weeks ago
- Scalable and efficient source of container resource metrics for Kubernetes built-in autoscaling pipelines.☆6,544Feb 16, 2026Updated 2 weeks ago
- Distributed AI Model Training and LLM Fine-Tuning on Kubernetes☆2,041Updated this week
- Using CRDs to manage GPU resources in Kubernetes.☆209Nov 21, 2022Updated 3 years ago
- Kubernetes Virtualization API and runtime in order to define and manage virtual machines.☆6,689Updated this week
- Prometheus exporter for a Infiniband Fabric☆69Dec 12, 2023Updated 2 years ago
- Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration☆5,314Updated this week
- Blackbox prober exporter☆5,560Feb 11, 2026Updated 3 weeks ago