GPUprobe / gpuprobe-daemon
Lightweight daemon for monitoring CUDA runtime API calls with eBPF uprobes
☆71Updated 2 weeks ago
Alternatives and similar repositories for gpuprobe-daemon:
Users that are interested in gpuprobe-daemon are comparing it to the libraries listed below
- An efficient GPU resource sharing system with fine-grained control for Linux platforms.☆78Updated 11 months ago
- CUDA checkpoint and restore utility☆305Updated last month
- NVIDIA NCCL Tests for Distributed Training☆82Updated this week
- The criu-coordinator tool aims to enable checkpoint/restore support for distributed applications with CRIU.☆20Updated this week
- A tool to detect infrastructure issues on cloud native AI systems☆26Updated 2 weeks ago
- Hooked CUDA-related dynamic libraries by using automated code generation tools.☆149Updated last year
- ☆43Updated 6 months ago
- Fast OS-level support for GPU checkpoint and restore☆167Updated last week
- Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the…☆301Updated last week
- NVIDIA GPUDirect Storage Driver☆231Updated 3 months ago
- NCCL Profiling Kit☆127Updated 8 months ago
- A distributed KV store for disaggregated LLM inference☆48Updated this week
- ☆42Updated 9 months ago
- DCPerf benchmark suite for hyperscale cloud applications☆159Updated last week
- An I/O benchmark for deep Learning applications☆80Updated this week
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆286Updated this week
- The NVIDIA GPU driver container allows the provisioning of the NVIDIA driver through the use of containers.☆94Updated this week
- rFaaS: a high-performance FaaS platform with RDMA acceleration for low-latency invocations.☆50Updated 2 months ago
- cricket is a virtualization solution for GPUs☆187Updated 3 weeks ago
- Artifacts for our NSDI'23 paper TGS☆74Updated 9 months ago
- ☆27Updated 2 years ago
- A curated list of awesome serverless research works, including papers and open-sourced projects.☆80Updated 2 years ago
- Live upgrade Linux kernel scheduler subsystem☆88Updated last year
- ☆30Updated 3 months ago
- 🧯 Kubernetes coverage for fault awareness and recovery, works for any LLMOps, MLOps, AI workloads.☆26Updated 2 months ago
- ☆36Updated 3 months ago
- MIG Partition Editor for NVIDIA GPUs☆189Updated this week
- An OS kernel module for fast **remote** fork using advanced datacenter networking (RDMA).☆59Updated 3 weeks ago
- qCUDA: GPGPU Virtualization at a New API Remoting Method with Para-virtualization☆120Updated 3 years ago
- Fine-grained GPU sharing primitives☆141Updated 5 years ago