facebookincubator / dynolog
Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.
☆258Updated this week
Related projects: ⓘ
- CUDA checkpoint and restore utility☆193Updated 5 months ago
- DCPerf benchmark suite for hyperscale cloud applications☆116Updated last week
- NCCL Profiling Kit☆104Updated 2 months ago
- A library to analyze PyTorch traces.☆270Updated last week
- NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.☆108Updated 10 months ago
- NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs☆379Updated 2 weeks ago
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆233Updated this week
- cricket is a virtualization solution for GPUs☆139Updated 8 months ago
- In-kernel cache based on eBPF.☆440Updated 2 years ago
- System performance characterization tool based on linux perf☆333Updated 3 weeks ago
- DAMON user-space tool☆153Updated 2 weeks ago
- The local version of the backend and UI for the gProfiler agent, featuring advanced flamegraph analysis tools. For the also free cloud ve…☆171Updated last month
- ☆218Updated last month
- Unified Collective Communication Library☆190Updated this week
- NVIDIA GPUDirect Storage Driver☆194Updated 3 months ago
- This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.☆138Updated this week
- A tool for bandwidth measurements on NVIDIA GPUs.☆285Updated 3 months ago
- ☆219Updated 9 months ago
- RDMA and SHARP plugins for nccl library☆154Updated this week
- Splits single Nvidia GPU into multiple partitions with complete compute and memory isolation (wrt to performace) between the partitions☆152Updated 5 years ago
- MLPerf™ Storage Benchmark Suite☆76Updated last month
- ☆120Updated 2 months ago
- PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…☆118Updated last week
- Demonstrate and benchmark various features of Linux resource control in a self-contained package.☆143Updated last month
- A validation and profiling tool for AI infrastructure☆252Updated this week
- Meta's fleetwide profiler framework☆32Updated last month
- MIG Partition Editor for NVIDIA GPUs☆163Updated this week
- A GPU-driven system framework for scalable AI applications☆103Updated this week
- An I/O benchmark for deep Learning applications☆61Updated 2 weeks ago
- Hooked CUDA-related dynamic libraries by using automated code generation tools.☆126Updated 9 months ago