GPUprobe / gpuprobe-daemon
Lightweight daemon for monitoring CUDA runtime API calls with eBPF uprobes
☆50Updated this week
Alternatives and similar repositories for gpuprobe-daemon:
Users that are interested in gpuprobe-daemon are comparing it to the libraries listed below
- CUDA checkpoint and restore utility☆292Updated 3 weeks ago
- Meta's fleetwide profiler framework☆106Updated 3 months ago
- Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the…☆296Updated this week
- A distributed KV store for disaggregated LLM inference☆31Updated this week
- DCPerf benchmark suite for hyperscale cloud applications☆157Updated this week
- An efficient GPU resource sharing system with fine-grained control for Linux platforms.☆77Updated 10 months ago
- Hooked CUDA-related dynamic libraries by using automated code generation tools.☆145Updated last year
- A tool to detect infrastructure issues on cloud native AI systems☆22Updated last week
- An I/O benchmark for deep Learning applications☆76Updated this week
- cricket is a virtualization solution for GPUs☆181Updated this week
- NVIDIA NCCL Tests for Distributed Training☆79Updated this week
- ☆30Updated 2 months ago
- The NVIDIA GPU driver container allows the provisioning of the NVIDIA driver through the use of containers.☆91Updated this week
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆82Updated this week
- NCCL Profiling Kit☆127Updated 7 months ago
- ☆41Updated 5 months ago
- The criu-coordinator tool aims to enable checkpoint/restore support for distributed applications with CRIU.☆16Updated 8 months ago
- A toolkit for discovering cluster network topology.☆35Updated this week
- ☆42Updated 9 months ago
- qCUDA: GPGPU Virtualization at a New API Remoting Method with Para-virtualization☆118Updated 3 years ago
- ☆224Updated this week
- Device plugins for Volcano, e.g. GPU☆114Updated 5 months ago
- Intelligent platform for AI workloads☆37Updated 2 years ago
- A Python script to convert the output of NVIDIA Nsight Systems (in SQLite format) to JSON in Google Chrome Trace Event Format.☆29Updated last month
- A Top-Down Profiler for GPU Applications☆17Updated 11 months ago
- Fast OS-level support for GPU checkpoint and restore☆153Updated this week
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆25Updated 4 months ago
- Microsoft Collective Communication Library☆62Updated 2 months ago
- A file system over RDMA☆25Updated 2 years ago