Lightweight daemon for monitoring CUDA runtime API calls with eBPF uprobes
☆147Mar 29, 2025Updated 11 months ago
Alternatives and similar repositories for gpuprobe-daemon
Users that are interested in gpuprobe-daemon are comparing it to the libraries listed below
Sorting:
- Extending eBPF Programmability and Observability to GPUs (merged into https://github.com/eunomia-bpf/bpftime)☆293Nov 24, 2025Updated 3 months ago
- Meta's fleetwide profiler framework☆342Sep 22, 2025Updated 5 months ago
- ☆238Dec 25, 2025Updated 2 months ago
- Userspace eBPF runtime for Observability, Network, GPU & General Extensions Framework☆1,404Updated this week
- GPU Admin Tools. Includes Confidential Computing controls for H100, and other functionality☆64Dec 2, 2025Updated 2 months ago
- Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the…☆362Updated this week
- ☆30Feb 9, 2026Updated 2 weeks ago
- ☆20Jul 10, 2025Updated 7 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆96Sep 19, 2025Updated 5 months ago
- Build NCCL-Tests and configure SSHD in PyTorch container to help you test NCCL faster!☆13Aug 28, 2025Updated 6 months ago
- Hooked CUDA-related dynamic libraries by using automated code generation tools.☆171Dec 12, 2023Updated 2 years ago
- GeminiFS: A Companion File System for GPUs☆71Feb 18, 2025Updated last year
- Compiler plugin for performance analysis of HIP applications☆13Apr 7, 2025Updated 10 months ago
- Code for "What really matters in matrix-whitening optimizers?"☆21Oct 31, 2025Updated 4 months ago
- ☆40Jun 30, 2025Updated 8 months ago
- Fast OS-level support for GPU checkpoint and restore☆271Sep 28, 2025Updated 5 months ago
- A tool for coordinated checkpoint/restore of distributed applications with CRIU☆31Feb 15, 2026Updated last week
- A Proof-of-concept CPU profiler written in Go using eBPF☆12Mar 6, 2023Updated 2 years ago
- [IWQoS 2025] eACGM: An eBPF-based Automated Comprehensive Governance and Monitoring framework for AI/ML systems.☆21Aug 11, 2025Updated 6 months ago
- GVProf: A Value Profiler for GPU-based Clusters☆53Mar 24, 2024Updated last year
- Nsight Compute In Docker☆13Dec 21, 2023Updated 2 years ago
- ☆26Jun 5, 2025Updated 8 months ago
- Implementation of the logging layer of our SOSP '23 paper Halfmoon☆11Jul 28, 2023Updated 2 years ago
- SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs☆60Mar 25, 2025Updated 11 months ago
- Userspace eBPF Runtime Benchmarking Test Suite and Results☆16Apr 21, 2024Updated last year
- Python bindings for the PMDK. Non-volatile memory for Python.☆13Mar 22, 2023Updated 2 years ago
- ☆15Jan 7, 2023Updated 3 years ago
- ☆13Dec 21, 2025Updated 2 months ago
- ☆13Feb 2, 2026Updated 3 weeks ago
- libsinsp, libscap, the kernel module driver, and the eBPF driver sources☆301Feb 20, 2026Updated last week
- High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph☆56Jul 3, 2022Updated 3 years ago
- cricket is a virtualization solution for GPUs☆236Sep 9, 2025Updated 5 months ago
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆476Updated this week
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆106Jun 28, 2025Updated 8 months ago
- Multi-GPU communication profiler and visualizer☆38Jun 10, 2024Updated last year
- libsinsp, libscap, the kernel module driver, and the eBPF driver sources☆14Oct 12, 2024Updated last year
- Automated build and mirror of eBPF kernel probes for use as a driver with the Falco runtime security agent (https://falco.org/)☆15Nov 18, 2024Updated last year
- ☆13Jun 23, 2022Updated 3 years ago
- CUDA checkpoint and restore utility☆424Sep 15, 2025Updated 5 months ago