GPUprobe / gpuprobe-daemon
Lightweight daemon for monitoring CUDA runtime API calls with eBPF uprobes
☆92Updated last month
Alternatives and similar repositories for gpuprobe-daemon:
Users that are interested in gpuprobe-daemon are comparing it to the libraries listed below
- Hooked CUDA-related dynamic libraries by using automated code generation tools.☆153Updated last year
- An efficient GPU resource sharing system with fine-grained control for Linux platforms.☆82Updated last year
- Fast OS-level support for GPU checkpoint and restore☆185Updated 3 weeks ago
- NCCL Profiling Kit☆133Updated 10 months ago
- CUDA checkpoint and restore utility☆330Updated 3 months ago
- An I/O benchmark for deep Learning applications☆87Updated this week
- Extending eBPF Programmability and Observability to GPUs☆40Updated this week
- cricket is a virtualization solution for GPUs☆195Updated 2 weeks ago
- A tool to detect infrastructure issues on cloud native AI systems☆34Updated last month
- DCPerf benchmark suite for hyperscale cloud applications☆166Updated this week
- This repository is an archive. Refer to https://github.com/gvirtus/GVirtuS☆42Updated 3 years ago
- KV cache store for distributed LLM inference☆165Updated this week
- NVIDIA GPUDirect Storage Driver☆241Updated this week
- Magnum IO community repo☆90Updated 3 months ago
- Artifacts for our NSDI'23 paper TGS☆75Updated 10 months ago
- ☆48Updated 8 months ago
- NVIDIA NCCL Tests for Distributed Training☆88Updated 2 weeks ago
- AI flame graph☆65Updated this week
- example code for using DC QP for providing RDMA READ and WRITE operations to remote GPU memory☆129Updated 9 months ago
- Multi-GPU communication profiler and visualizer☆28Updated 10 months ago
- ☆50Updated 6 months ago
- An interference-aware scheduler for fine-grained GPU sharing☆133Updated 3 months ago
- qCUDA: GPGPU Virtualization at a New API Remoting Method with Para-virtualization☆122Updated 3 years ago
- Provides a set of benchmarks that can be used to measure the memory bandwidth performance of CPU's☆89Updated last year
- Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the…☆313Updated last week
- NVIDIA Inference Xfer Library (NIXL)☆304Updated this week
- The criu-coordinator tool aims to enable checkpoint/restore support for distributed applications with CRIU.☆21Updated last month
- Microsoft Collective Communication Library☆65Updated 5 months ago
- ☆58Updated 2 months ago
- Live upgrade Linux kernel scheduler subsystem☆88Updated last year