Azure / mscclLinks

Microsoft Collective Communication Library

☆66

Alternatives and similar repositories for msccl

Users that are interested in msccl are comparing it to the libraries listed below

Sorting:

microsoft / NPKit
NCCL Profiling Kit
☆145Updated last year
mcrl / tccl
Thunder Research Group's Collective Communication Library
☆42Updated 3 months ago
parasailteam / coconet
☆83Updated 2 years ago
Azure / msccl-executor-nccl
☆46Updated 10 months ago
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆181Updated last week
alibaba / easydist
Automated Parallelization System and Infrastructure for Multiple Ecosystems
☆80Updated 11 months ago
microsoft / msccl-tools
Synthesizer for optimal collective communication algorithms
☆118Updated last year
microsoft / SuperScaler
An experimental parallel training platform
☆54Updated last year
hao-ai-lab / MuxServe
☆72Updated last year
alpa-projects / mms
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)
☆88Updated 2 years ago
shenh10 / DeepSeek_Simulator
☆90Updated 6 months ago
facebookresearch / param
PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…
☆153Updated last week
WukLab / preble
Stateful LLM Serving
☆86Updated 7 months ago
awslabs / optimizing-multitask-training-through-dynamic-pipelines
Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
☆20Updated last year
microsoft / taccl
TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches
☆76Updated 2 years ago
eniac / paella
Paella: Low-latency Model Serving with Virtualized GPU Scheduling
☆62Updated last year
Hsword / SpotServe
SpotServe: Serving Generative Large Language Models on Preemptible Instances
☆129Updated last year
alibaba / llm-scheduling-artifact
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆62Updated last year
zartbot / shallowsim
DeepSeek-V3/R1 inference performance simulator
☆170Updated 6 months ago
KuangjuX / NVSHMEM-Tutorial
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆138Updated last month
eth-easl / orion
An interference-aware scheduler for fine-grained GPU sharing
☆147Updated 8 months ago
SymbioticLab / Oobleck
A resilient distributed training framework
☆95Updated last year
microsoft / msccl
Microsoft Collective Communication Library
☆367Updated 2 years ago
microsoft / mscclpp
MSCCL++: A GPU-driven communication stack for scalable AI applications
☆425Updated this week
flexflow / flexflow-serve
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
☆63Updated last month
ParCIS / Chimera
Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.
☆66Updated 7 months ago
Raphael-Hao / brainstorm
Compiler for Dynamic Neural Networks
☆46Updated last year
uclasystem / bamboo
Bamboo is a system for running large pipeline-parallel DNNs affordably, reliably, and efficiently using spot instances.
☆51Updated 2 years ago
DeepLink-org / DLSlime
DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit
☆68Updated last week
google / nccl-fastsocket
NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.
☆121Updated last year