Azure/msccl-executor-nccl

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Azure/msccl-executor-nccl)

Azure / msccl-executor-nccl

☆47

Alternatives and similar repositories for msccl-executor-nccl

Users that are interested in msccl-executor-nccl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

microsoft / NPKit
View on GitHub
NCCL Profiling Kit
☆155Jul 1, 2024Updated 2 years ago
Mellanox / nccl-rdma-sharp-plugins
View on GitHub
RDMA and SHARP plugins for nccl library
☆233Apr 3, 2026Updated 3 months ago
microsoft / mscclpp
View on GitHub
MSCCL++: A GPU-driven communication stack for scalable AI applications
☆542Updated this week
aliyun / syccl
View on GitHub
☆24Sep 10, 2025Updated 10 months ago
ROCm / rccl
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆419Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Azure / msccl
View on GitHub
Microsoft Collective Communication Library
☆66Nov 23, 2024Updated last year
zhuzilin / pytorch-malloc
View on GitHub
An external memory allocator example for PyTorch.
☆16Aug 10, 2025Updated 11 months ago
harnets / multiverse
View on GitHub
GPU-accelerated LLM Training Simulator
☆22Jun 26, 2025Updated last year
microsoft / msccl
View on GitHub
Microsoft Collective Communication Library
☆394Sep 20, 2023Updated 2 years ago
google / nccl-fastsocket
View on GitHub
NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.
☆125Nov 15, 2023Updated 2 years ago
sii-research / VCCL
View on GitHub
Venus Collective Communication Library, supported by SII and Infrawaves.
☆151Jun 24, 2026Updated 3 weeks ago
microsoft / vattention
View on GitHub
Dynamic Memory Management for Serving LLMs without PagedAttention
☆504Updated this week
feifeibear / PyTorchMemTracer
View on GitHub
Depict GPU memory footprint during DNN training of PyTorch
☆11Nov 17, 2022Updated 3 years ago
lumina-test / lumina
View on GitHub
Lumina is a user-friendly tool to test the correctness and performance of hardware network stacks.
☆29Jan 8, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
vllm-project / vllm-nccl
View on GitHub
Manages vllm-nccl dependency
☆18Jun 3, 2024Updated 2 years ago
ROCm / rccl-tests
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆92Jul 14, 2026Updated last week
TiledTensor / TiledLower
View on GitHub
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆13Nov 23, 2024Updated last year
feifeibear / PSTensor
View on GitHub
PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.
☆10Feb 10, 2022Updated 4 years ago
microsoft / taccl
View on GitHub
TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches
☆83Jul 25, 2023Updated 2 years ago
astra-sim / tacos
View on GitHub
TACOS: [T]opology-[A]ware [Co]llective Algorithm [S]ynthesizer for Distributed Machine Learning
☆37Jun 13, 2025Updated last year
vllm-project / tml-fa4
View on GitHub
FA4-based Relative Attention Kernel developed by TML and Colfax
☆17Updated this week
Karbo123 / pytorch_grouped_gemm
View on GitHub
High Performance Grouped GEMM in PyTorch
☆30May 10, 2022Updated 4 years ago
alan-hpc / cuda_op_benchmark
View on GitHub
方便扩展的Cuda算子理解和优化框架，仅用在学习使用
☆18Jun 13, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Infrawaves / DeepEP_ibrc_dual-ports_multiQP
View on GitHub
Aims to implement dual-port and multi-qp solutions in deepEP ibrc transport
☆75May 9, 2025Updated last year
thu-cs-lab / TCP-Lab-Docs
View on GitHub
Documentation for TCP Lab
☆12May 15, 2026Updated 2 months ago
kaist-ina / ns3-tlt-tcp-public
View on GitHub
This is an official GitHub repository for the paper, "Towards timeout-less transport in commodity datacenter networks.".
☆17Oct 12, 2021Updated 4 years ago
KuangjuX / NVSHMEM-Tutorial
View on GitHub
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆195Feb 11, 2026Updated 5 months ago
LeiWang1999 / AutoGPTQ.tvm
View on GitHub
GPTQ inference TVM kernel
☆41Apr 25, 2024Updated 2 years ago
ParCIS / Chimera
View on GitHub
Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.
☆72Mar 20, 2025Updated last year
lipracer / cuda-rt-hook
View on GitHub
☆46Jul 16, 2025Updated last year
Azure / MS-AMP
View on GitHub
Microsoft Automatic Mixed Precision Library
☆636Dec 1, 2025Updated 7 months ago
xdit-project / DiTCacheAnalysis
View on GitHub
An auxiliary project analysis of the characteristics of KV in DiT Attention.
☆34Nov 29, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ryantd / veloce
View on GitHub
WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.
☆17Aug 4, 2022Updated 3 years ago
Huangxy-Minel / System-Design-for-Federated-Learning
View on GitHub
Paper list of federated learning: About system design
☆13Apr 13, 2022Updated 4 years ago
NVIDIA / cuda-checkpoint
View on GitHub
CUDA checkpoint and restore utility
☆474Jul 6, 2026Updated 2 weeks ago
Bruce-Lee-LY / cutlass_gemm
View on GitHub
Multiple GEMM operators are constructed with cutlass to support LLM inference.
☆20Aug 3, 2025Updated 11 months ago
sgl-project / DeepGEMM
View on GitHub
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
☆32Updated this week
Mellanox / ibdump
View on GitHub
☆55Feb 1, 2026Updated 5 months ago
kazukiosawa / pipe-fisher
View on GitHub
☆10Apr 29, 2023Updated 3 years ago