leptonai / gpud
GPUd automates monitoring, diagnostics, and issue identification for GPUs
☆286Updated this week
Alternatives and similar repositories for gpud:
Users that are interested in gpud are comparing it to the libraries listed below
- Efficient and easy multi-instance LLM serving☆332Updated this week
- A kubernetes plugin which enables dynamically add or remove GPU resources for a running Pod☆124Updated 3 years ago
- ☆296Updated 7 months ago
- NVIDIA NCCL Tests for Distributed Training☆84Updated last week
- ☆237Updated this week
- CUDA checkpoint and restore utility☆306Updated last month
- A distributed KV store for disaggregated LLM inference☆62Updated this week
- Kubernetes Operator for AI and Bigdata Elastic Training