mcrl/tccl

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mcrl/tccl)

mcrl / tccl

Thunder Research Group's Collective Communication Library

☆53

Alternatives and similar repositories for tccl

Users that are interested in tccl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Oneflow-Inc / dfccl
View on GitHub
☆27Feb 17, 2025Updated last year
microsoft / mscclpp
View on GitHub
MSCCL++: A GPU-driven communication stack for scalable AI applications
☆542Updated this week
merthidayetoglu / HiCCL
View on GitHub
A hierarchical collective communications library with portable optimizations
☆38Dec 8, 2024Updated last year
microsoft / NPKit
View on GitHub
NCCL Profiling Kit
☆155Jul 1, 2024Updated 2 years ago
microsoft / cusync
View on GitHub
☆27Feb 20, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
infinigence / FlashOverlap
View on GitHub
A lightweight design for computation-communication overlap.
☆242Jan 20, 2026Updated 6 months ago
microsoft / msccl
View on GitHub
Microsoft Collective Communication Library
☆394Sep 20, 2023Updated 2 years ago
Mellanox / nccl-rdma-sharp-plugins
View on GitHub
RDMA and SHARP plugins for nccl library
☆233Apr 3, 2026Updated 3 months ago
ColfaxResearch / cutlass-kernels
View on GitHub
☆270Jul 11, 2024Updated 2 years ago
osayamenja / FlashMoE
View on GitHub
Distributed MoE in a Single Kernel [NeurIPS '25]
☆273May 5, 2026Updated 2 months ago
bytedance / flux
View on GitHub
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,345Aug 28, 2025Updated 10 months ago
shenh10 / DeepSeek_Simulator
View on GitHub
☆100Apr 2, 2025Updated last year
infinigence / FUSCO
View on GitHub
High-performance distributed data shuffling (all-to-all) library for MoE training and inference
☆123Mar 7, 2026Updated 4 months ago
KuangjuX / AttnLink
View on GitHub
An experimental communicating attention kernel based on DeepEP.
☆34Jul 29, 2025Updated 11 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
DeepLink-org / DLSlime
View on GitHub
Composable and Embeddable Communication Runtime for Distributed AI Services
☆102Jun 5, 2026Updated last month
stepfun-ai / StepMesh
View on GitHub
☆379Jan 28, 2026Updated 5 months ago
mpi-advance / locality_aware
View on GitHub
Collective and Neighbor Collective Optimizations and Extensions
☆15Jul 14, 2026Updated last week
microsoft / TE-CCL
View on GitHub
☆56Aug 27, 2024Updated last year
perplexityai / pplx-kernels
View on GitHub
Perplexity GPU Kernels
☆593Nov 7, 2025Updated 8 months ago
ROCm / iris
View on GitHub
AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming
☆193Updated this week
SJTU-IPADS / MetaAttention
View on GitHub
MetaAttention: A Unified and Performant Attention Framework Across Hardware Backends(PPoPP'26)
☆16Dec 31, 2025Updated 6 months ago
facebookexperimental / triton
View on GitHub
Github mirror of trition-lang/triton repo.
☆181Updated this week
yifuwang / symm-mem-recipes
View on GitHub
☆170Dec 27, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
openucx / xccl
View on GitHub
☆26May 19, 2021Updated 5 years ago
ademeure / cuda-side-boost
View on GitHub
☆60Feb 24, 2026Updated 5 months ago
muriloboratto / NVSHEMEM
View on GitHub
Sample Codes using NVSHMEM on Multi-GPU
☆30Jan 22, 2023Updated 3 years ago
ROCm / rocSHMEM
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆146Updated this week
NVlabs / mixedproxy
View on GitHub
☆15Nov 14, 2023Updated 2 years ago
cchan / tccl
View on GitHub
extensible collectives library in triton
☆97Mar 31, 2025Updated last year
UofT-EcoSystem / Minuet
View on GitHub
[EuroSys'24] Minuet: Accelerating 3D Sparse Convolutions on GPUs
☆80Jun 7, 2024Updated 2 years ago
LeiWang1999 / AutoGPTQ.tvm
View on GitHub
GPTQ inference TVM kernel
☆41Apr 25, 2024Updated 2 years ago
NVIDIA / multi-gpu-programming-models
View on GitHub
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
☆909Sep 26, 2025Updated 10 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
KnowingNothing / MatmulTutorial
View on GitHub
A Easy-to-understand TensorOp Matmul Tutorial
☆445Mar 5, 2026Updated 4 months ago
uxlfoundation / oneCCL
View on GitHub
oneAPI Collective Communications Library (oneCCL)
☆268Updated this week
uccl-project / uccl
View on GitHub
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g…
☆1,471Updated this week
lightsighter / CudaDMA
View on GitHub
Emulating DMA Engines on GPUs for Performance and Portability
☆43May 17, 2015Updated 11 years ago
openucx / ucc
View on GitHub
Unified Collective Communication Library
☆311Jul 17, 2026Updated last week
infinigence / HamiltonAttention
View on GitHub
☆45Oct 15, 2025Updated 9 months ago
weiya711 / sam
View on GitHub
☆18Oct 17, 2025Updated 9 months ago