Mellanox / ngc_multinode_perfLinks
Performance tests for multinode NGC.Ready certification
☆15Updated 2 months ago
Alternatives and similar repositories for ngc_multinode_perf
Users that are interested in ngc_multinode_perf are comparing it to the libraries listed below
Sorting:
- Prometheus exporter for a Infiniband Fabric☆68Updated 2 years ago
- HPK allows running Kubernetes applications within HPC by translating deployments to Slurm and Singularity/Apptainer☆28Updated last week
- ☆43Updated last year
- Run Slurm as a Kubernetes scheduler. A Slinky project.☆58Updated last month
- KNoC is a Kubernetes Virtual Kubelet that uses an HPC cluster as the container execution environment☆21Updated 2 years ago
- Nvidia-smi Prometheus exporter with respecting of GPU-UUID☆38Updated 2 years ago
- Spectrum Scale Installation and Configuration☆79Updated this week
- NVIDIA NCCL Tests for Distributed Training☆133Updated this week
- InfiniBand SR-IOV CNI☆13Updated last month
- IP Over Infiniband (IPoIB) CNI Plugin☆16Updated this week
- Project to manage Flux tasks needed to standardize kubernetes HPC scheduling interfaces☆26Updated last week
- A Slurm cluster for Kubernetes☆67Updated last year
- InfiniBand fabric monitoring daemon written in Go☆32Updated 7 months ago
- Intel HPC Containers using Singularity☆19Updated 3 years ago
- ☆70Updated last week
- Environment modules for NGC containers☆29Updated 4 years ago
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆140Updated last week
- Documentation repository for NVIDIA Cloud Native Technologies☆34Updated last week
- Run Slurm on Kubernetes. A Slinky project.☆215Updated this week
- IBM Spectrum LSF - IBM Cloud☆15Updated last year
- A tool to detect infrastructure issues on cloud native AI systems☆52Updated 4 months ago
- Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.☆118Updated this week
- Lustre Monitoring System based on Collectd, Grafana and Influxdb☆46Updated 2 years ago
- Tools to deploy GPU clusters in the Cloud☆34Updated 2 years ago
- A terminal based monitoring tool for InfiniBand networks using Detector (https://github.com/hhu-bsinfo/detector)☆14Updated 6 years ago
- Run cloud native workloads on NVIDIA GPUs☆215Updated this week
- Prometheus collector and exporter for Slurm cluster metrics. A Slinky project.☆15Updated 2 months ago
- MPI Microbenchmarks☆46Updated 9 years ago
- Device Metrics Exporter exports metrics from AMD devices (GPUs) to collectors like Prometheus.☆40Updated this week
- Lustre Monitoring System☆26Updated 10 months ago