microsoft/NPKit

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/microsoft/NPKit)

microsoft / NPKit

NCCL Profiling Kit

☆155

Alternatives and similar repositories for NPKit

Users that are interested in NPKit are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Mellanox / nccl-rdma-sharp-plugins
View on GitHub
RDMA and SHARP plugins for nccl library
☆233Apr 3, 2026Updated 3 months ago
microsoft / msccl
View on GitHub
Microsoft Collective Communication Library
☆394Sep 20, 2023Updated 2 years ago
microsoft / mscclpp
View on GitHub
MSCCL++: A GPU-driven communication stack for scalable AI applications
☆541Updated this week
microsoft / msccl-tools
View on GitHub
Synthesizer for optimal collective communication algorithms
☆125Apr 8, 2024Updated 2 years ago
Azure / msccl-executor-nccl
View on GitHub
☆47Dec 13, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Azure / msccl
View on GitHub
Microsoft Collective Communication Library
☆66Nov 23, 2024Updated last year
microsoft / taccl
View on GitHub
TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches
☆83Jul 25, 2023Updated 2 years ago
google / nccl-fastsocket
View on GitHub
NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.
☆125Nov 15, 2023Updated 2 years ago
ROCm / rccl
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆419Updated this week
openucx / ucc
View on GitHub
Unified Collective Communication Library
☆310Updated this week
aws / aws-ofi-nccl
View on GitHub
This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.
☆228Updated this week
Oneflow-Inc / dfccl
View on GitHub
☆26Feb 17, 2025Updated last year
astra-sim / tacos
View on GitHub
TACOS: [T]opology-[A]ware [Co]llective Algorithm [S]ynthesizer for Distributed Machine Learning
☆37Jun 13, 2025Updated last year
spcl / muliticast-based-allgather
View on GitHub
☆24Feb 12, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
openucx / torch-ucc
View on GitHub
pytorch ucc plugin
☆23Jul 8, 2021Updated 5 years ago
sii-research / VCCL
View on GitHub
Venus Collective Communication Library, supported by SII and Infrawaves.
☆151Jun 24, 2026Updated 3 weeks ago
ROCm / rocSHMEM
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆146Updated this week
meta-pytorch / torchcomms
View on GitHub
torchcomms: a modern PyTorch communications API
☆377Updated this week
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,493Jul 11, 2026Updated last week
uxlfoundation / oneCCL
View on GitHub
oneAPI Collective Communications Library (oneCCL)
☆268Updated this week
phoenix-dataplane / mCCS
View on GitHub
Managed collective communication service
☆24Sep 2, 2024Updated last year
parasailteam / coconet
View on GitHub
☆85Dec 2, 2022Updated 3 years ago
mcrl / tccl
View on GitHub
Thunder Research Group's Collective Communication Library
☆53Jul 8, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
microsoft / TE-CCL
View on GitHub
☆56Aug 27, 2024Updated last year
NVIDIA / nccl-tests
View on GitHub
NCCL Tests
☆1,595Jul 9, 2026Updated last week
mellanox-hpc / libibprof
View on GitHub
verbs profiling library
☆22Sep 22, 2023Updated 2 years ago
bytedance / flux
View on GitHub
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,343Aug 28, 2025Updated 10 months ago
NVIDIA / gdrcopy
View on GitHub
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
☆1,399Updated this week
gpudirect / libgdsync
View on GitHub
GPUDirect Async support for IB Verbs
☆139Nov 10, 2022Updated 3 years ago
NVIDIA / nccl
View on GitHub
Optimized primitives for collective multi-GPU communication
☆4,892Updated this week
antgroup / DeepXTrace
View on GitHub
DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.
☆100Jan 16, 2026Updated 6 months ago
NVIDIA / nvbandwidth
View on GitHub
A tool for bandwidth measurements on NVIDIA GPUs.
☆732Apr 8, 2026Updated 3 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
facebookresearch / HolisticTraceAnalysis
View on GitHub
A library to analyze PyTorch traces.
☆535May 29, 2026Updated last month
ngimel / nccl.torch
View on GitHub
torch bindings for nccl
☆28Apr 29, 2018Updated 8 years ago
ROCm / rccl-tests
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆92Updated this week
astra-sim / astra-sim
View on GitHub
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
☆641Apr 25, 2026Updated 2 months ago
inclusionAI / asystem-amem
View on GitHub
A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.
☆110Dec 17, 2025Updated 7 months ago
Mellanox / nv_peer_memory
View on GitHub
☆399Apr 23, 2024Updated 2 years ago
ai-dynamo / nixl
View on GitHub
NVIDIA Inference Xfer Library (NIXL)
☆1,138Updated this week