uccl-project / ucclLinks

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

☆1,066

Alternatives and similar repositories for uccl

Users that are interested in uccl are comparing it to the libraries listed below

Sorting:

eunomia-bpf / eGPU
Extending eBPF Programmability and Observability to GPUs (merged into https://github.com/eunomia-bpf/bpftime)
☆267Updated last week
aliyun / SimAI
☆736Updated 3 weeks ago
Zefan-Cai / KVCache-Factory
Unified KV Cache Compression Methods for Auto-Regressive Models
☆1,277Updated 10 months ago
bytedance / ABQ-LLM
An acceleration library that supports arbitrary bit-width combinatorial quantization operations
☆238Updated last year
zhihu / ZhiLight
A highly optimized LLM inference acceleration engine for Llama and its variants.
☆904Updated 4 months ago
CXLMemUring / CXLMemUring
CXL remote offloading data movement aware compiler
☆30Updated last month
Zefan-Cai / R-KV
[Neurips 2025] R-KV: Redundancy-aware KV Cache Compression for Reasoning Models
☆1,148Updated last month
ChenmienTan / RL2
☆918Updated this week
hyperai / tvm-cn
TVM Documentation in Chinese Simplified / TVM 中文文档
☆2,653Updated 2 weeks ago
AIoT-MLSys-Lab / SVD-LLM
[ICLR 2025🔥] SVD-LLM & [NAACL 2025🔥] SVD-LLM V2
☆262Updated 2 months ago
Multi-V-VM / MVVM
Heterogeneous Containerization of Large Language Model Apps
☆107Updated 3 months ago
ai-dynamo / nixl
NVIDIA Inference Xfer Library (NIXL)
☆721Updated last week
expert-kit / expert-kit
Expert Kit is an efficient foundation of Expert Parallelism (EP) for MoE model Inference on heterogenous hardware
☆59Updated 3 weeks ago
AlibabaPAI / llumnix
Efficient and easy multi-instance LLM serving
☆510Updated 2 months ago
ByteDance-Seed / ShadowKV
[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
☆272Updated 6 months ago
Qcompiler / MIXQ
MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction
☆94Updated last year
flagos-ai / FlagPerf
FlagPerf is an open-source software platform for benchmarking AI chips.
☆353Updated 2 weeks ago
wqzustc / High-Performance-Tensor-Processing-Engines
Some Hardware Architectures for GEMM
☆283Updated 6 months ago
Multi-V-VM / hetGPU
PTX on XPUs
☆109Updated last week
stepfun-ai / StepMesh
☆320Updated last week
Infini-AI-Lab / TriForce
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
☆271Updated last year
perplexityai / pplx-kernels
Perplexity GPU Kernels
☆529Updated 2 weeks ago
microsoft / vattention
Dynamic Memory Management for Serving LLMs without PagedAttention
☆436Updated 5 months ago
meta-pytorch / torchcomms
torchcomms: a modern PyTorch communications API
☆291Updated this week
AGI-Arena / MARS
The official implementation of MARS: Unleashing the Power of Variance Reduction for Training Large Models
☆713Updated 3 weeks ago
bytedance / InfiniStore
KV cache store for distributed LLM inference
☆363Updated last week
vllm-project / semantic-router
Intelligent Router for Mixture-of-Models
☆2,294Updated this week
LLMServe / DistServe
Disaggregated serving system for Large Language Models (LLMs).
☆734Updated 7 months ago
microsoft / sarathi-serve
A low-latency & high-throughput serving engine for LLMs
☆447Updated last month
SJTU-IPADS / PhoenixOS
Fast OS-level support for GPU checkpoint and restore
☆257Updated last month