GPUprobe / gpuprobe-daemonLinks
Lightweight daemon for monitoring CUDA runtime API calls with eBPF uprobes
☆144Updated 9 months ago
Alternatives and similar repositories for gpuprobe-daemon
Users that are interested in gpuprobe-daemon are comparing it to the libraries listed below
Sorting:
- cricket is a virtualization solution for GPUs☆231Updated 4 months ago
- CUDA checkpoint and restore utility☆401Updated 3 months ago
- Hooked CUDA-related dynamic libraries by using automated code generation tools.☆172Updated 2 years ago
- Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the…☆359Updated last week
- Fast OS-level support for GPU checkpoint and restore☆267Updated 3 months ago
- Systematic and comprehensive benchmarks for LLM systems.☆47Updated last month
- A tool to detect infrastructure issues on cloud native AI systems☆52Updated 3 months ago
- A tool for coordinated checkpoint/restore of distributed applications with CRIU☆30Updated 4 months ago
- An efficient GPU resource sharing system with fine-grained control for Linux platforms.☆87Updated last year
- Offline optimization of your disaggregated Dynamo graph☆137Updated this week
- AI/GPU flame graph☆238Updated 3 months ago
- DCPerf benchmark suite for hyperscale cloud applications☆225Updated last week
- ☆222Updated 2 weeks ago
- NVIDIA NCCL Tests for Distributed Training☆132Updated this week
- A light weight vLLM simulator, for mocking out replicas.☆76Updated last week
- ☆38Updated 2 months ago
- ☆20Updated 6 months ago
- Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, T…☆355Updated this week
- NCCL Profiling Kit☆150Updated last year
- This repository is an archive. Refer to https://github.com/gvirtus/GVirtuS☆44Updated 3 years ago
- ☆72Updated 11 months ago
- KV cache store for distributed LLM inference☆384Updated last month
- GPU scheduler for elastic/distributed deep learning workloads in Kubernetes cluster (IC2E'23)☆34Updated 2 years ago
- qCUDA: GPGPU Virtualization at a New API Remoting Method with Para-virtualization☆132Updated 3 years ago
- Artifacts for our NSDI'23 paper TGS☆95Updated last year
- ☆43Updated last year
- ☆67Updated last year
- NVIDIA GPUDirect Storage Driver☆318Updated 3 weeks ago
- Splits single Nvidia GPU into multiple partitions with complete compute and memory isolation (wrt to performace) between the partitions☆164Updated 6 years ago
- Distributed KV cache scheduling & offloading libraries☆94Updated this week