NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
☆1,677Apr 7, 2026Updated last week
Alternatives and similar repositories for dcgm-exporter
Users that are interested in dcgm-exporter are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs☆701Mar 30, 2026Updated 2 weeks ago
- Nvidia GPU exporter for prometheus using nvidia-smi binary☆1,453Apr 3, 2026Updated last week
- NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes☆2,635Updated this week
- Golang bindings for Nvidia Datacenter GPU Manager (DCGM)☆151Updated this week
- NVIDIA device plugin for Kubernetes☆3,720Updated this week
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Go Bindings for the NVIDIA Management Library (NVML)☆431Apr 6, 2026Updated last week
- Heterogeneous GPU Sharing on Kubernetes☆3,257Updated this week
- GPU plugin to the node feature discovery for Kubernetes☆307May 27, 2024Updated last year
- Tools for monitoring NVIDIA GPUs on Linux☆1,070Nov 2, 2021Updated 4 years ago
- ☆349Updated this week
- Exporter for machine metrics☆13,308Updated this week
- The NVIDIA Driver Manager is a Kubernetes component which assist in seamless upgrades of NVIDIA Driver on each node of the cluster.☆52Updated this week
- A Cloud Native Batch System (Project under CNCF)☆5,440Updated this week
- A toolkit to run Ray applications on Kubernetes☆2,432Updated this week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- LeaderWorkerSet: An API for deploying a group of pods as a unit of replication☆693Updated this week
- Build and run containers leveraging NVIDIA GPUs☆4,252Updated this week
- NVIDIA DRA Driver for GPUs☆619Updated this week
- MIG Partition Editor for NVIDIA GPUs☆245Updated this week
- Node feature discovery for Kubernetes☆1,017Updated this week
- GPU Sharing Scheduler for Kubernetes Cluster☆1,532Dec 29, 2023Updated 2 years ago
- This is a place for various problem detectors running on the Kubernetes nodes.☆3,366Apr 7, 2026Updated last week
- NVIDIA k8s device plugin for Kubevirt☆280Apr 6, 2026Updated last week
- Add-on agent to generate and expose cluster-level metrics.☆6,108Updated this week
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale☆1,233Updated this week
- Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes☆5,332Updated this week
- A tool for bandwidth measurements on NVIDIA GPUs.☆659Updated this week
- NVIDIA Network Operator☆329Updated this week
- ☆893Apr 2, 2024Updated 2 years ago
- NCCL Tests☆1,485Mar 11, 2026Updated last month
- NVIDIA container runtime library☆1,092Mar 30, 2026Updated 2 weeks ago
- Kubernetes-native Job Queueing☆2,422Updated this week
- HAMi-core compiles libvgpu.so, which ensures hard limit on GPU in container☆291Apr 3, 2026Updated last week
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Prometheus exporter for a Infiniband Fabric☆70Dec 12, 2023Updated 2 years ago
- Kubernetes Virtualization API and runtime in order to define and manage virtual machines.☆6,801Updated this week
- Distributed AI Model Training and LLM Fine-Tuning on Kubernetes☆2,081Updated this week
- Repository for out-of-tree scheduler plugins based on scheduler framework.☆1,283Mar 19, 2026Updated 3 weeks ago
- Scalable and efficient source of container resource metrics for Kubernetes built-in autoscaling pipelines.☆6,595Updated this week
- Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration☆5,380Updated this week
- OpenAIOS vGPU device plugin for Kubernetes is originated from the OpenAIOS project to virtualize GPU device memory, in order to allow app…☆588May 21, 2024Updated last year