GPUprobe / gpuprobe-daemonLinks
Lightweight daemon for monitoring CUDA runtime API calls with eBPF uprobes
☆140Updated 8 months ago
Alternatives and similar repositories for gpuprobe-daemon
Users that are interested in gpuprobe-daemon are comparing it to the libraries listed below
Sorting:
- CUDA checkpoint and restore utility☆393Updated 2 months ago
- cricket is a virtualization solution for GPUs☆224Updated 2 months ago
- Hooked CUDA-related dynamic libraries by using automated code generation tools.☆172Updated last year
- Fast OS-level support for GPU checkpoint and restore☆257Updated 2 months ago
- AI/GPU flame graph☆189Updated last month
- A tool to detect infrastructure issues on cloud native AI systems☆52Updated 2 months ago
- DCPerf benchmark suite for hyperscale cloud applications☆220Updated this week
- A tool for coordinated checkpoint/restore of distributed applications with CRIU☆29Updated 3 months ago
- An efficient GPU resource sharing system with fine-grained control for Linux platforms.☆87Updated last year
- ☆20Updated 4 months ago
- Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the…☆353Updated this week
- qCUDA: GPGPU Virtualization at a New API Remoting Method with Para-virtualization☆131Updated 3 years ago
- Offline optimization of your disaggregated Dynamo graph☆110Updated this week
- Systematic and comprehensive benchmarks for LLM systems.☆41Updated last week
- NCCL Profiling Kit☆149Updated last year
- NVIDIA NCCL Tests for Distributed Training☆126Updated 2 weeks ago
- ☆37Updated last month
- OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)☆322Updated this week
- ☆209Updated 3 months ago
- An I/O benchmark for deep Learning applications☆94Updated 3 weeks ago
- NVIDIA GPUDirect Storage Driver☆300Updated 3 months ago
- This repository is an archive. Refer to https://github.com/gvirtus/GVirtuS☆44Updated 3 years ago
- Distributed KV cache coordinator☆88Updated this week
- Intercepting CUDA runtime calls with LD_PRELOAD☆43Updated 11 years ago
- Artifacts for our NSDI'23 paper TGS☆90Updated last year
- KV cache store for distributed LLM inference☆368Updated 2 weeks ago
- ☆72Updated 9 months ago
- rFaaS: a high-performance FaaS platform with RDMA acceleration for low-latency invocations.☆57Updated 4 months ago
- Practical GPU Sharing Without Memory Size Constraints☆293Updated 8 months ago
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆456Updated this week