facebookincubator/dynolog

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/facebookincubator/dynolog)

facebookincubator / dynolog

Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.

☆375

Alternatives and similar repositories for dynolog

Users that are interested in dynolog are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

facebookresearch / HolisticTraceAnalysis
View on GitHub
A library to analyze PyTorch traces.
☆535May 29, 2026Updated last month
pytorch / kineto
View on GitHub
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
☆974Updated this week
facebookresearch / param
View on GitHub
PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…
☆155Jul 2, 2026Updated 2 weeks ago
facebookincubator / strobelight
View on GitHub
Meta's fleetwide profiler framework
☆349Jul 7, 2026Updated last week
microsoft / NPKit
View on GitHub
NCCL Profiling Kit
☆155Jul 1, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
leptonai / gpud
View on GitHub
GPUd automates monitoring, diagnostics, and issue identification for GPUs
☆486Updated this week
google / CoMMA
View on GitHub
☆24Jun 29, 2026Updated 3 weeks ago
openteams-ai / torch-build
View on GitHub
Collection of scripts to build PyTorch and the domain libraries from source.
☆14Jul 9, 2026Updated last week
aztecher / bdc
View on GitHub
BDC is the eBPF powered DNS caching mechanism in kernel inspired by BMC
☆10May 13, 2022Updated 4 years ago
microsoft / ParrotServe
View on GitHub
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
☆222Sep 21, 2024Updated last year
ai-dynamo / nixl
View on GitHub
NVIDIA Inference Xfer Library (NIXL)
☆1,139Updated this week
meta-pytorch / MSLK
View on GitHub
MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI tr…
☆121Updated this week
meta-pytorch / torchft
View on GitHub
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
☆523Updated this week
NVIDIA / DCGM
View on GitHub
NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
☆763Jul 6, 2026Updated 2 weeks ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
parca-dev / parcagpu
View on GitHub
CUPTI based GPU profiling library exposing usdt hooks
☆37Jun 30, 2026Updated 2 weeks ago
GPUprobe / gpuprobe-daemon
View on GitHub
Lightweight daemon for monitoring CUDA runtime API calls with eBPF uprobes
☆152Mar 29, 2025Updated last year
rs3lab / Odinfs
View on GitHub
☆18May 16, 2022Updated 4 years ago
NVIDIA / cuda-checkpoint
View on GitHub
CUDA checkpoint and restore utility
☆474Jul 6, 2026Updated 2 weeks ago
gpu-os / GPU-CR
View on GitHub
GPU-CR: GPU Checkpoint & Restore
☆25Jun 4, 2026Updated last month
eunomia-bpf / nccl-eBPF
View on GitHub
☆20Jul 7, 2026Updated 2 weeks ago
NVIDIA / nvbandwidth
View on GitHub
A tool for bandwidth measurements on NVIDIA GPUs.
☆734Updated this week
NVIDIA / nvbench
View on GitHub
CUDA Kernel Benchmarking Library
☆901Updated this week
ai-dynamo / dynamo
View on GitHub
A Datacenter Scale Distributed Inference Serving Framework
☆7,540Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
microsoft / msccl
View on GitHub
Microsoft Collective Communication Library
☆394Sep 20, 2023Updated 2 years ago
ROCm / TransferBench
View on GitHub
TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)
☆74Updated this week
microsoft / mscclpp
View on GitHub
MSCCL++: A GPU-driven communication stack for scalable AI applications
☆541Updated this week
ROCm / rocprofiler
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆152May 28, 2026Updated last month
pytorch / gloo
View on GitHub
Collective communications library with various primitives for multi-machine training.
☆1,438Jul 1, 2026Updated 2 weeks ago
NVIDIA / NVTX
View on GitHub
The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…
☆547Updated this week
microsoft / sarathi-serve
View on GitHub
A low-latency & high-throughput serving engine for LLMs
☆511Jan 8, 2026Updated 6 months ago
ech-o-o / AisLSM
View on GitHub
☆13Feb 6, 2026Updated 5 months ago
meta-pytorch / tritonparse
View on GitHub
TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels
☆211Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
vortexgpgpu / Volt
View on GitHub
☆17Feb 9, 2026Updated 5 months ago
kvcache-ai / Mooncake
View on GitHub
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
☆5,925Updated this week
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆5,988Updated this week
SJTU-IPADS / PhoenixOS
View on GitHub
Fast OS-level support for GPU checkpoint and restore
☆286Sep 28, 2025Updated 9 months ago
volcengine / veScale
View on GitHub
Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs
☆1,031Mar 3, 2026Updated 4 months ago
NVIDIA / nccl
View on GitHub
Optimized primitives for collective multi-GPU communication
☆4,893Updated this week
NVlabs / nvbitfi
View on GitHub
Architecture-level Fault Injection Tool for GPU Application Resilience Evaluation
☆84Oct 17, 2023Updated 2 years ago