sii-research / VCCLLinks
Venus Collective Communication Library, supported by SII and Infrawaves.
☆107Updated last week
Alternatives and similar repositories for VCCL
Users that are interested in VCCL are comparing it to the libraries listed below
Sorting:
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆114Updated 5 months ago
 - A lightweight design for computation-communication overlap.☆182Updated 3 weeks ago
 - NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆142Updated last month
 - DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆73Updated last week
 - ☆310Updated last month
 - High performance Transformer implementation in C++.☆139Updated 9 months ago
 - ☆46Updated 10 months ago
 - ☆90Updated 7 months ago
 - ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆124Updated 5 months ago
 - ☆122Updated this week
 - Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆119Updated last year
 - DeepSeek-V3/R1 inference performance simulator☆170Updated 7 months ago
 - ☆63Updated last month
 - FlexFlow Serve: Low-Latency, High-Performance LLM Serving☆63Updated last month
 - Stateful LLM Serving☆87Updated 7 months ago
 - Fast and memory-efficient exact attention☆96Updated 2 weeks ago
 - DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆65Updated last week
 - Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆278Updated 7 months ago
 - Aims to implement dual-port and multi-qp solutions in deepEP ibrc transport☆66Updated 5 months ago
 - Efficient Compute-Communication Overlap for Distributed LLM Inference☆61Updated this week
 - ☆97Updated 7 months ago
 - gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling☆42Updated last month
 - ☆67Updated 9 months ago
 - ☆74Updated 2 weeks ago
 - ☆101Updated last year
 - [DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"☆81Updated 4 months ago
 - Fast OS-level support for GPU checkpoint and restore☆252Updated last month
 - DeeperGEMM: crazy optimized version☆72Updated 5 months ago
 - ☆43Updated 6 months ago
 - ☆107Updated 5 months ago