Azure / azurehpc-health-checksLinks
Health checks for Azure N- and H-series VMs.
☆48Updated this week
Alternatives and similar repositories for azurehpc-health-checks
Users that are interested in azurehpc-health-checks are comparing it to the libraries listed below
Sorting:
- NVIDIA NCCL Tests for Distributed Training☆100Updated last week
- ☆43Updated last year
- A tool to detect infrastructure issues on cloud native AI systems☆44Updated last week
- ☆64Updated this week
- NVIDIA Network Operator☆268Updated this week
- RDMA CNI plugin for containerized workloads☆55Updated this week
- ☆283Updated last week
- MIG Partition Editor for NVIDIA GPUs☆204Updated last week
- Kubernetes Rdma SRIOV device plugin☆111Updated 4 years ago
- Cloud Native Benchmarking of Foundation Models☆39Updated last month
- An efficient GPU resource sharing system with fine-grained control for Linux platforms.☆84Updated last year
- Kubernetes Container Runtime Interface proxy service with hardware resource aware workload placement policies☆179Updated last week
- A toolkit for discovering cluster network topology.☆59Updated last week
- RDMA device plugin for Kubernetes☆217Updated last year
- knavigator is a development, testing, and optimization toolkit for AI/ML scheduling systems at scale on Kubernetes.☆69Updated 2 weeks ago
- Go Abstraction for Allocating NVIDIA GPUs with Custom Policies☆116Updated last month
- ☆66Updated 6 months ago
- The NVIDIA GPU driver container allows the provisioning of the NVIDIA driver through the use of containers.☆121Updated last week
- 🧯 Kubernetes coverage for fault awareness and recovery, works for any LLMOps, MLOps, AI workloads.☆31Updated last week
- DPDK & SR-IOV CNI plugin☆19Updated last week
- ☆25Updated 2 weeks ago
- GenAI inference performance benchmarking tool☆71Updated this week
- InfiniBand SR-IOV CNI☆53Updated this week
- Azure HPC/AI VM Images☆114Updated last week
- Hooked CUDA-related dynamic libraries by using automated code generation tools.☆160Updated last year
- A light weight vLLM simulator, for mocking out replicas.☆30Updated this week
- ☆254Updated last month
- cricket is a virtualization solution for GPUs☆211Updated last month
- Systematic and comprehensive benchmarks for LLM systems.☆21Updated last month
- A validation and profiling tool for AI infrastructure☆325Updated last week