sii-research / VCCLLinks
Venus Collective Communication Library, supported by SII and Infrawaves.
☆114Updated 3 weeks ago
Alternatives and similar repositories for VCCL
Users that are interested in VCCL are comparing it to the libraries listed below
Sorting:
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆144Updated 2 months ago
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆82Updated this week
- A lightweight design for computation-communication overlap.☆188Updated last month
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆116Updated 6 months ago
- ☆320Updated 2 weeks ago
- High performance Transformer implementation in C++.☆142Updated 10 months ago
- DeepSeek-V3/R1 inference performance simulator☆168Updated 8 months ago
- ☆47Updated 11 months ago
- ☆90Updated 7 months ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆119Updated last year
- ☆128Updated last week
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆129Updated 6 months ago
- Tile-based language built for AI computation across all scales☆82Updated this week
- ☆45Updated 7 months ago
- Multi-Level Triton Runner supporting Python, IR, PTX, and cubin.☆76Updated 2 weeks ago
- Efficient Compute-Communication Overlap for Distributed LLM Inference☆62Updated last month
- ☆112Updated 6 months ago
- ☆65Updated 7 months ago
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆279Updated 8 months ago
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving☆62Updated 2 months ago
- Aims to implement dual-port and multi-qp solutions in deepEP ibrc transport☆67Updated 6 months ago
- Stateful LLM Serving☆89Updated 8 months ago
- Microsoft Collective Communication Library☆66Updated last year
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆68Updated 3 weeks ago
- ☆102Updated last year
- [DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"☆90Updated 5 months ago
- nnScaler: Compiling DNN models for Parallel Training☆119Updated 2 months ago
- gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling☆51Updated last week
- Thunder Research Group's Collective Communication Library☆42Updated 4 months ago
- Triton adapter for Ascend. Mirror of https://gitee.com/ascend/triton-ascend☆84Updated last week