facebookincubator / dynolog
Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.
☆270Updated last week
Related projects ⓘ
Alternatives and complementary repositories for dynolog
- CUDA checkpoint and restore utility☆226Updated 7 months ago
- DCPerf benchmark suite for hyperscale cloud applications☆139Updated this week
- NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.☆112Updated last year
- cricket is a virtualization solution for GPUs☆153Updated 10 months ago
- An I/O benchmark for deep Learning applications☆69Updated 3 weeks ago
- NCCL Profiling Kit☆112Updated 4 months ago
- A library to analyze PyTorch traces.☆308Updated this week
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆250Updated this week
- System performance analysis and characterization tool☆343Updated this week
- Efficient and easy multi-instance LLM serving☆216Updated this week
- NVIDIA GPUDirect Storage Driver☆203Updated this week
- DAMON user-space tool☆155Updated 2 months ago
- Hooked CUDA-related dynamic libraries by using automated code generation tools.☆139Updated 11 months ago
- A tool for bandwidth measurements on NVIDIA GPUs.☆321Updated last month
- The local version of the backend and UI for the gProfiler agent, featuring advanced flamegraph analysis tools. For the also free cloud ve…☆172Updated this week
- NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs☆416Updated this week
- MLPerf™ Storage Benchmark Suite☆98Updated 3 months ago
- ☆214Updated this week
- RDMA and SHARP plugins for nccl library☆162Updated last week
- Meta's fleetwide profiler framework☆46Updated 3 weeks ago
- Unified Collective Communication Library☆207Updated last week
- Splits single Nvidia GPU into multiple partitions with complete compute and memory isolation (wrt to performace) between the partitions☆153Updated 5 years ago
- An efficient GPU resource sharing system with fine-grained control for Linux platforms.☆73Updated 7 months ago
- ☆273Updated 3 months ago
- Automatic tuning for ML model deployment on Kubernetes☆80Updated 3 weeks ago
- Senpai is an automated memory sizing tool for container applications.☆303Updated 9 months ago
- PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…☆124Updated this week
- Magnum IO community repo☆79Updated 5 months ago
- Fine-grained GPU sharing primitives☆140Updated 4 years ago
- Microsoft Collective Communication Library☆54Updated last month