paperg / NCCL_GPLinks

Separate from hardware and used to learn some NCCL mechanisms

☆24

Alternatives and similar repositories for NCCL_GP

Users that are interested in NCCL_GP are comparing it to the libraries listed below

Sorting:

aliyun / aicb
☆226Updated 2 months ago
StarryVae / RDMA-tutorial
☆212Updated 2 years ago
Mellanox / gpu_direct_rdma_access
example code for using DC QP for providing RDMA READ and WRITE operations to remote GPU memory
☆148Updated last year
spcl / muliticast-based-allgather
☆21Updated 9 months ago
Mellanox / nccl-rdma-sharp-plugins
RDMA and SHARP plugins for nccl library
☆215Updated 2 weeks ago
zhangmenghao / RDMA-Tutorial
☆30Updated last year
microsoft / NPKit
NCCL Profiling Kit
☆149Updated last year
aliyun / ns-3-alibabacloud
☆71Updated 6 months ago
mlcommons / chakra
Repository for MLCommons Chakra schema and tools
☆142Updated last month
aliyun / SimCCL
☆42Updated last year
zartbot / shallowsim
DeepSeek-V3/R1 inference performance simulator
☆169Updated 8 months ago
microsoft / taccl
TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches
☆78Updated 2 years ago
animeshtrivedi / rdma-example
RDMA exmaple
☆227Updated 3 years ago
ASISys / Adrenaline
Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation
☆39Updated 3 weeks ago
astra-sim / astra-sim
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
☆477Updated this week
pkusys / TGS
Artifacts for our NSDI'23 paper TGS
☆91Updated last year
Li-Weihang / python_rdma_test
This is an RDMA program written in Python, based on the Pyverbs provided by the rdma-core(https://github.com/linux-rdma/rdma-core) reposi…
☆34Updated 3 years ago
NVIDIA / doroce-linux
A command line utility to manage the configuration of a system's high performance network interfaces for RoCE deployments
☆33Updated 2 years ago
NVIDIA-DOCA / gpunetio
Open source version of DOCA GPUNetIO and DOCA Verbs libraries (limited features) to enable GDAKI technology on RDMA (IB and RoCE)
☆19Updated 2 months ago
HaifengSun-Kira / RDMA-Tutorial
☆38Updated 3 years ago
Bruce-Lee-LY / cuda_hook
Hooked CUDA-related dynamic libraries by using automated code generation tools.
☆172Updated last year
shenh10 / DeepSeek_Simulator
☆90Updated 8 months ago
eth-easl / orion
An interference-aware scheduler for fine-grained GPU sharing
☆153Updated last week
calculon-ai / calculon
☆160Updated last year
spcl / atlahs
ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage
☆54Updated 3 weeks ago
DicardoX / Research-Space
This repository is established to store personal notes and annotated papers during daily research.
☆165Updated this week
casys-kaist / glet
☆53Updated 11 months ago
mental2008 / awesome-papers
Here are my personal paper reading notes (including cloud computing, resource management, systems, machine learning, deep learning, and o…
☆137Updated last month
SJTU-IPADS / reef
REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU sche…
☆103Updated 2 years ago
gbxu / autoccl
[NSDI25] Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training
☆28Updated 7 months ago