NVIDIA/nccl-tests

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NVIDIA/nccl-tests)

NVIDIA / nccl-tests

NCCL Tests

☆1,595

Alternatives and similar repositories for nccl-tests

Users that are interested in nccl-tests are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

NVIDIA / nccl
View on GitHub
Optimized primitives for collective multi-GPU communication
☆4,893Updated this week
NVIDIA / nvbandwidth
View on GitHub
A tool for bandwidth measurements on NVIDIA GPUs.
☆734Updated this week
Mellanox / nccl-rdma-sharp-plugins
View on GitHub
RDMA and SHARP plugins for nccl library
☆233Apr 3, 2026Updated 3 months ago
linux-rdma / perftest
View on GitHub
Infiniband Verbs Performance Tests
☆999Jul 12, 2026Updated last week
NVIDIA / gdrcopy
View on GitHub
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
☆1,399Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
microsoft / msccl
View on GitHub
Microsoft Collective Communication Library
☆394Sep 20, 2023Updated 2 years ago
Mellanox / nv_peer_memory
View on GitHub
☆399Apr 23, 2024Updated 2 years ago
NVIDIA / TransformerEngine
View on GitHub
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…
☆3,435Updated this week
pytorch / gloo
View on GitHub
Collective communications library with various primitives for multi-machine training.
☆1,438Jul 1, 2026Updated 2 weeks ago
NVIDIA / Megatron-LM
View on GitHub
Ongoing research training transformer models at scale
☆17,125Updated this week
google / nccl-fastsocket
View on GitHub
NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.
☆125Nov 15, 2023Updated 2 years ago
NVIDIA / multi-gpu-programming-models
View on GitHub
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
☆908Sep 26, 2025Updated 9 months ago
microsoft / NPKit
View on GitHub
NCCL Profiling Kit
☆155Jul 1, 2024Updated 2 years ago
bytedance / flux
View on GitHub
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,344Aug 28, 2025Updated 10 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
coreweave / nccl-tests
View on GitHub
NVIDIA NCCL Tests for Distributed Training
☆149Jul 8, 2026Updated last week
microsoft / mscclpp
View on GitHub
MSCCL++: A GPU-driven communication stack for scalable AI applications
☆541Updated this week
aws / aws-ofi-nccl
View on GitHub
This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.
☆228Updated this week
NVIDIA / cutlass
View on GitHub
CUDA Templates and Python DSLs for High-Performance Linear Algebra
☆10,104Updated this week
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆5,988Updated this week
NVIDIA / cuda-samples
View on GitHub
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
☆9,406May 27, 2026Updated last month
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,494Updated this week
NVIDIA / DCGM
View on GitHub
NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
☆763Jul 6, 2026Updated 2 weeks ago
pytorch / kineto
View on GitHub
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
☆974Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
openucx / ucx
View on GitHub
Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
☆1,673Updated this week
ai-dynamo / nixl
View on GitHub
NVIDIA Inference Xfer Library (NIXL)
☆1,139Updated this week
microsoft / msccl-tools
View on GitHub
Synthesizer for optimal collective communication algorithms
☆125Apr 8, 2024Updated 2 years ago
kvcache-ai / Mooncake
View on GitHub
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
☆5,925Updated this week
openucx / ucc
View on GitHub
Unified Collective Communication Library
☆310Updated this week
NVIDIA / nvshmem
View on GitHub
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…
☆560Updated this week
ROCm / rccl
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆419Updated this week
deepspeedai / Megatron-DeepSpeed
View on GitHub
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆2,257Aug 14, 2025Updated 11 months ago
ROCm / rccl-tests
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆92Updated this week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
meta-pytorch / torchcomms
View on GitHub
torchcomms: a modern PyTorch communications API
☆377Updated this week
Mellanox / k8s-rdma-shared-dev-plugin
View on GitHub
☆375Updated this week
NVIDIA / nvbench
View on GitHub
CUDA Kernel Benchmarking Library
☆901Updated this week
volcengine / veScale
View on GitHub
Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs
☆1,031Mar 3, 2026Updated 4 months ago
NVIDIA / FasterTransformer
View on GitHub
Transformer related optimization, including BERT, GPT
☆6,442Mar 27, 2024Updated 2 years ago
facebookresearch / HolisticTraceAnalysis
View on GitHub
A library to analyze PyTorch traces.
☆535May 29, 2026Updated last month
triton-lang / triton
View on GitHub
Development repository for the Triton language and compiler
☆19,738Updated this week