Lightweight daemon for monitoring CUDA runtime API calls with eBPF uprobes
☆147Mar 29, 2025Updated 11 months ago
Alternatives and similar repositories for gpuprobe-daemon
Users that are interested in gpuprobe-daemon are comparing it to the libraries listed below
Sorting:
- Extending eBPF Programmability and Observability to GPUs (merged into https://github.com/eunomia-bpf/bpftime)☆295Nov 24, 2025Updated 3 months ago
- Meta's fleetwide profiler framework☆345Sep 22, 2025Updated 5 months ago
- ☆244Dec 25, 2025Updated 2 months ago
- Fast OS-level support for GPU checkpoint and restore☆277Sep 28, 2025Updated 5 months ago
- ☆21Jul 10, 2025Updated 8 months ago
- ☆30Feb 27, 2026Updated 3 weeks ago
- ☆26Jun 5, 2025Updated 9 months ago
- GPU Admin Tools. Includes Confidential Computing controls for H100, and other functionality☆65Dec 2, 2025Updated 3 months ago
- ☆12May 13, 2025Updated 10 months ago
- Hooked CUDA-related dynamic libraries by using automated code generation tools.☆171Dec 12, 2023Updated 2 years ago
- Security Observability Framework for ML/AI Model File Loading☆43Aug 20, 2025Updated 7 months ago
- ☆40Jun 30, 2025Updated 8 months ago
- AI/GPU flame graph☆253Feb 18, 2026Updated last month
- ☆15Jan 7, 2023Updated 3 years ago
- Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the…☆366Mar 12, 2026Updated last week
- A multi-level dataflow tracer for capturing I/O calls from workflows.☆21Mar 14, 2026Updated last week
- eBPF verifier based on abstract interpretation☆456Updated this week
- GeminiFS: A Companion File System for GPUs☆72Feb 18, 2025Updated last year
- Nsight Compute In Docker☆13Dec 21, 2023Updated 2 years ago
- Build NCCL-Tests and configure SSHD in PyTorch container to help you test NCCL faster!☆13Aug 28, 2025Updated 6 months ago
- CUDA checkpoint and restore utility☆429Sep 15, 2025Updated 6 months ago
- Userspace eBPF Runtime Benchmarking Test Suite and Results☆16Updated this week
- Implementation of the Reusable Enclaves paper☆14Sep 25, 2023Updated 2 years ago
- Python bindings for the PMDK. Non-volatile memory for Python.☆13Mar 22, 2023Updated 2 years ago
- PalanTír: Optimizing Attack Provenance with Hardware-enhanced System Observability, ACM CCS'22☆24Nov 11, 2024Updated last year
- [IWQoS 2025] eACGM: An eBPF-based Automated Comprehensive Governance and Monitoring framework for AI/ML systems.☆21Aug 11, 2025Updated 7 months ago
- Code for "What really matters in matrix-whitening optimizers?"☆23Oct 31, 2025Updated 4 months ago
- eBPF Security Monitoring Agent Based on Aya☆40Updated this week
- High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph☆55Jul 3, 2022Updated 3 years ago
- A Top-Down Profiler for GPU Applications☆22Feb 29, 2024Updated 2 years ago
- Framework to reduce autotune overhead to zero for well known deployments.☆97Sep 19, 2025Updated 6 months ago
- A tool for coordinated checkpoint/restore of distributed applications with CRIU☆31Mar 2, 2026Updated 2 weeks ago
- cricket is a virtualization solution for GPUs☆236Sep 9, 2025Updated 6 months ago
- libsinsp, libscap, the kernel module driver, and the eBPF driver sources☆302Mar 12, 2026Updated last week
- Artifacts for ATC '22 paper "Faster Software Packet Processing on FPGA NICs with eBPF Program Warping"☆17May 20, 2022Updated 3 years ago
- Toolchain built around the Megatron-LM for Distributed Training☆90Mar 5, 2026Updated 2 weeks ago
- ☆13Jun 23, 2022Updated 3 years ago
- GVProf: A Value Profiler for GPU-based Clusters☆53Mar 24, 2024Updated last year
- Zero instrucment LLM and AI agent (e.g. claude code, gemini-cli) observability in eBPF☆235Mar 7, 2026Updated last week