microsoft / msccl
Microsoft Collective Communication Library
☆317Updated last year
Related projects ⓘ
Alternatives and complementary repositories for msccl
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆246Updated this week
- Synthesizer for optimal collective communication algorithms☆98Updated 7 months ago
- NCCL Profiling Kit☆109Updated 4 months ago
- RDMA and SHARP plugins for nccl library☆160Updated 3 weeks ago
- NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.☆113Updated 11 months ago
- ☆72Updated last year
- TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches☆63Updated last year
- A baseline repository of Auto-Parallelism in Training Neural Networks☆142Updated 2 years ago
- Microsoft Collective Communication Library☆51Updated last month
- ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale☆267Updated this week
- Repository for MLCommons Chakra schema and tools☆64Updated 2 weeks ago
- PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…☆122Updated this week
- A fast communication-overlapping library for tensor parallelism on GPUs.☆217Updated last week
- A tool for bandwidth measurements on NVIDIA GPUs.☆316Updated 3 weeks ago
- nnScaler: Compiling DNN models for Parallel Training☆62Updated 2 weeks ago
- Unified Collective Communication Library☆205Updated this week
- Curated collection of papers in machine learning systems☆156Updated last month
- ☆106Updated 8 months ago
- An experimental parallel training platform☆52Updated 7 months ago
- An Efficient Pipelined Data Parallel Approach for Training Large Model☆70Updated 3 years ago
- A validation and profiling tool for AI infrastructure☆270Updated this week
- Assembler for NVIDIA Volta and Turing GPUs☆200Updated 2 years ago
- ROCm Communication Collectives Library (RCCL)☆267Updated this week
- Dynamic Memory Management for Serving LLMs without PagedAttention☆222Updated this week
- ☆74Updated 2 weeks ago
- Shared Middle-Layer for Triton Compilation☆185Updated this week
- ☆140Updated last year
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆57Updated 6 months ago
- AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)☆79Updated last year
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆75Updated this week