Thunder Research Group's Collective Communication Library
☆47Jul 8, 2025Updated 7 months ago
Alternatives and similar repositories for tccl
Users that are interested in tccl are comparing it to the libraries listed below
Sorting:
- ☆26Feb 17, 2025Updated last year
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆475Updated this week
- NCCL Profiling Kit☆152Jul 1, 2024Updated last year
- A hierarchical collective communications library with portable optimizations☆37Dec 8, 2024Updated last year
- Sample Codes using NVSHMEM on Multi-GPU☆30Jan 22, 2023Updated 3 years ago
- A lightweight design for computation-communication overlap.☆223Jan 20, 2026Updated last month
- Microsoft Collective Communication Library☆385Sep 20, 2023Updated 2 years ago
- RDMA and SHARP plugins for nccl library☆224Jan 12, 2026Updated last month
- MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI tr…☆55Updated this week
- ☆11Nov 14, 2023Updated 2 years ago
- ☆262Jul 11, 2024Updated last year
- ☆160Dec 27, 2024Updated last year
- ☆25Feb 20, 2024Updated 2 years ago
- Collective and Neighbor Collective Optimizations and Extensions☆13Feb 24, 2026Updated last week
- ☆26May 19, 2021Updated 4 years ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆163Feb 11, 2026Updated 3 weeks ago
- ☆87Updated this week
- ☆53Feb 24, 2026Updated last week
- ☆178May 7, 2025Updated 9 months ago
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆1,261Aug 28, 2025Updated 6 months ago
- NVIDIA Inference Xfer Library (NIXL)☆898Updated this week
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆145Feb 23, 2026Updated last week
- ☆41Oct 15, 2025Updated 4 months ago
- GPTQ inference TVM kernel☆40Apr 25, 2024Updated last year
- Emulating DMA Engines on GPUs for Performance and Portability☆41May 17, 2015Updated 10 years ago
- ☆95Apr 2, 2025Updated 11 months ago
- [EuroSys'24] Minuet: Accelerating 3D Sparse Convolutions on GPUs☆80Jun 7, 2024Updated last year
- Examples demonstrating available options to program multiple GPUs in a single node or a cluster☆869Sep 26, 2025Updated 5 months ago
- ☆49Aug 27, 2024Updated last year
- ☆17Oct 17, 2025Updated 4 months ago
- A Easy-to-understand TensorOp Matmul Tutorial☆410Feb 11, 2026Updated 3 weeks ago
- extensible collectives library in triton☆96Mar 31, 2025Updated 11 months ago
- Unified Collective Communication Library☆293Updated this week
- Distributed Compiler based on Triton for Parallel Systems☆1,371Feb 13, 2026Updated 2 weeks ago
- ☆84Dec 2, 2022Updated 3 years ago
- Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Mult…☆41Mar 17, 2024Updated last year
- Managed collective communication service☆23Sep 2, 2024Updated last year
- This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".☆109Sep 24, 2025Updated 5 months ago
- NCCL Examples from Official NVIDIA NCCL Developer Guide.☆20May 29, 2018Updated 7 years ago