UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
☆1,224Feb 28, 2026Updated last week
Alternatives and similar repositories for uccl
Users that are interested in uccl are comparing it to the libraries listed below
Sorting:
- Distributed Compiler based on Triton for Parallel Systems☆1,371Feb 13, 2026Updated 3 weeks ago
- NVIDIA Inference Xfer Library (NIXL)☆898Feb 28, 2026Updated last week
- ☆347Jan 28, 2026Updated last month
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆1,264Aug 28, 2025Updated 6 months ago
- Perplexity GPU Kernels☆567Nov 7, 2025Updated 4 months ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆165Feb 11, 2026Updated 3 weeks ago
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆475Feb 28, 2026Updated last week
- FlashInfer: Kernel Library for LLM Serving☆5,057Updated this week
- ☆112Oct 16, 2025Updated 4 months ago
- KV cache store for distributed LLM inference☆396Nov 13, 2025Updated 3 months ago
- 📚 TG-EDU综合教育平台 | 支持作业提交📝、批量评分✅、补交申请🔄、团队协作👥、成绩统计📊☆111Dec 3, 2025Updated 3 months ago
- DeeperGEMM: crazy optimized version☆74May 5, 2025Updated 10 months ago
- ☆241Dec 25, 2025Updated 2 months ago
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆4,843Updated this week
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆2,148Feb 23, 2026Updated last week
- ☆41Oct 15, 2025Updated 4 months ago
- Dynamic Memory Management for Serving LLMs without PagedAttention☆464May 30, 2025Updated 9 months ago
- ☆72Oct 18, 2025Updated 4 months ago
- NCCL Profiling Kit☆152Jul 1, 2024Updated last year
- A lightweight design for computation-communication overlap.☆223Jan 20, 2026Updated last month
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆796Feb 27, 2026Updated last week
- Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs☆984Updated this week
- To help everyone to build their blog to learn☆49Nov 5, 2025Updated 4 months ago
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆5,284Updated this week
- A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology☆1,347Dec 17, 2025Updated 2 months ago
- Attention-based Deep Reinforcement Learning framework for portfolio allocation on S&P 500 equities. Includes custom environment, policy a…☆163Oct 16, 2025Updated 4 months ago
- Disaggregated serving system for Large Language Models (LLMs).☆778Apr 6, 2025Updated 11 months ago
- A throughput-oriented high-performance serving framework for LLMs☆947Oct 29, 2025Updated 4 months ago
- Microsoft Collective Communication Library☆66Nov 23, 2024Updated last year
- ☆160Dec 27, 2024Updated last year
- DeepSeek-V3/R1 inference performance simulator☆179Mar 27, 2025Updated 11 months ago
- The config panel for ai sdk.☆96Nov 2, 2025Updated 4 months ago
- Tile primitives for speedy kernels☆3,202Feb 24, 2026Updated last week
- nnScaler: Compiling DNN models for Parallel Training☆124Sep 23, 2025Updated 5 months ago
- ☆816Feb 28, 2026Updated last week
- Interactively browse multimodal tabular data☆104Feb 11, 2026Updated 3 weeks ago
- Optimized primitives for collective multi-GPU communication☆4,495Updated this week
- OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.☆9,391Dec 4, 2025Updated 3 months ago
- Efficient and easy multi-instance LLM serving☆528Sep 3, 2025Updated 6 months ago