microsoft / mscclLinks

Microsoft Collective Communication Library

☆352

Alternatives and similar repositories for msccl

Users that are interested in msccl are comparing it to the libraries listed below

Sorting:

microsoft / msccl-tools
Synthesizer for optimal collective communication algorithms
☆110Updated last year
microsoft / mscclpp
MSCCL++: A GPU-driven communication stack for scalable AI applications
☆390Updated this week
microsoft / NPKit
NCCL Profiling Kit
☆139Updated last year
Mellanox / nccl-rdma-sharp-plugins
RDMA and SHARP plugins for nccl library
☆199Updated last month
uccl-project / uccl
Ultra and Unified CCL
☆440Updated this week
parasailteam / coconet
☆80Updated 2 years ago
google / nccl-fastsocket
NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.
☆118Updated last year
microsoft / taccl
TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches
☆74Updated 2 years ago
mlcommons / chakra
Repository for MLCommons Chakra schema and tools
☆114Updated last month
ConnollyLeon / awesome-Auto-Parallelism
A baseline repository of Auto-Parallelism in Training Neural Networks
☆144Updated 3 years ago
Azure / msccl
Microsoft Collective Communication Library
☆63Updated 8 months ago
facebookresearch / param
PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…
☆147Updated last week
mcrl / tccl
Thunder Research Group's Collective Communication Library
☆39Updated 3 weeks ago
calculon-ai / calculon
☆145Updated last year
AlibabaPAI / DAPPLE
An Efficient Pipelined Data Parallel Approach for Training Large Model
☆77Updated 4 years ago
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆114Updated 3 weeks ago
eniac / paella
Paella: Low-latency Model Serving with Virtualized GPU Scheduling
☆60Updated last year
NVIDIA / nvbandwidth
A tool for bandwidth measurements on NVIDIA GPUs.
☆492Updated 3 months ago
zartbot / shallowsim
DeepSeek-V3/R1 inference performance simulator
☆156Updated 4 months ago
eth-easl / orion
An interference-aware scheduler for fine-grained GPU sharing
☆142Updated 6 months ago
openucx / ucc
Unified Collective Communication Library
☆262Updated this week
SJTU-IPADS / reef
REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU sche…
☆96Updated 2 years ago
microsoft / SuperScaler
An experimental parallel training platform
☆54Updated last year
SymbioticLab / Salus
Fine-grained GPU sharing primitives
☆143Updated this week
ParCIS / Chimera
Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.
☆67Updated 4 months ago
alpa-projects / mms
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)
☆83Updated 2 years ago
HPMLL / BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
☆189Updated last week
Mellanox / nv_peer_memory
☆361Updated last year
yifuwang / symm-mem-recipes
☆101Updated 7 months ago
Mellanox / gpu_direct_rdma_access
example code for using DC QP for providing RDMA READ and WRITE operations to remote GPU memory
☆137Updated last year