GPUprobe / gpuprobe-daemonLinks
Lightweight daemon for monitoring CUDA runtime API calls with eBPF uprobes
☆129Updated 6 months ago
Alternatives and similar repositories for gpuprobe-daemon
Users that are interested in gpuprobe-daemon are comparing it to the libraries listed below
Sorting:
- CUDA checkpoint and restore utility☆371Updated 2 weeks ago
- AI/GPU flame graph☆185Updated last week
- cricket is a virtualization solution for GPUs☆216Updated 3 weeks ago
- A tool for coordinated checkpoint/restore of distributed applications with CRIU☆28Updated last month
- Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the…☆345Updated this week
- Fast OS-level support for GPU checkpoint and restore☆238Updated this week
- A tool to detect infrastructure issues on cloud native AI systems☆47Updated 2 weeks ago
- ☆35Updated last month
- An efficient GPU resource sharing system with fine-grained control for Linux platforms.☆85Updated last year
- DCPerf benchmark suite for hyperscale cloud applications☆206Updated this week
- ☆21Updated 2 months ago
- Hooked CUDA-related dynamic libraries by using automated code generation tools.☆167Updated last year
- Systematic and comprehensive benchmarks for LLM systems.☆36Updated last month
- A collection of CUDA programming examples to learn GPU programming☆30Updated 3 months ago
- qCUDA: GPGPU Virtualization at a New API Remoting Method with Para-virtualization☆129Updated 3 years ago
- GPU scheduler for elastic/distributed deep learning workloads in Kubernetes cluster (IC2E'23)☆35Updated last year
- ☆190Updated last month
- NVIDIA GPUDirect Storage Driver☆284Updated last month
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆436Updated this week
- An OS kernel module for fast **remote** fork using advanced datacenter networking (RDMA).☆64Updated 7 months ago
- KV cache store for distributed LLM inference☆336Updated 3 weeks ago
- OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)☆279Updated this week
- [NSDI '24] DINT: Fast In-Kernel Distributed Transactions with eBPF☆48Updated last year
- Practical GPU Sharing Without Memory Size Constraints☆286Updated 6 months ago
- ☆52Updated last year
- InfiniStore: an elastic serverless cloud storage system (VLDB'23)☆24Updated 2 years ago
- ☆52Updated 2 months ago
- ☆70Updated 7 months ago
- This repository is an archive. Refer to https://github.com/gvirtus/GVirtuS☆45Updated 3 years ago
- ☆49Updated 11 months ago