uccl-project / ucclLinks
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
☆1,188Updated this week
Alternatives and similar repositories for uccl
Users that are interested in uccl are comparing it to the libraries listed below
Sorting:
- Extending eBPF Programmability and Observability to GPUs (merged into https://github.com/eunomia-bpf/bpftime)☆288Updated 2 months ago
- ☆803Updated 3 weeks ago
- CXLMemSim: A pure software simulated CXL.mem for performance characterization☆504Updated this week
- Unified KV Cache Compression Methods for Auto-Regressive Models☆1,298Updated last year
- A highly optimized LLM inference acceleration engine for Llama and its variants.☆905Updated 6 months ago
- An acceleration library that supports arbitrary bit-width combinatorial quantization operations☆240Updated last year
- CXL remote offloading data movement aware compiler☆71Updated 3 weeks ago
- TVM Documentation in Chinese Simplified / TVM 中文文档☆3,110Updated 2 months ago
- [Neurips 2025] R-KV: Redundancy-aware KV Cache Compression for Reasoning Models☆1,170Updated 3 months ago
- ☆1,089Updated last week
- Expert Kit is an efficient foundation of Expert Parallelism (EP) for MoE model Inference on heterogenous hardware☆61Updated this week
- PTX on XPUs☆119Updated last week
- Heterogeneous Containerization of Large Language Model Apps☆109Updated 6 months ago
- [ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference☆280Updated 8 months ago
- FlagPerf is an open-source software platform for benchmarking AI chips.☆359Updated 2 months ago
- ☆140Updated 6 months ago
- [ICLR 2025🔥] SVD-LLM & [NAACL 2025 🔥] SVD-LLM V2☆275Updated 5 months ago
- GLake: optimizing GPU memory management and IO transmission.☆497Updated 10 months ago
- Efficient and easy multi-instance LLM serving☆523Updated 4 months ago
- KV cache store for distributed LLM inference☆387Updated 2 months ago
- NVIDIA Inference Xfer Library (NIXL)☆844Updated this week
- Some Hardware Architectures for GEMM☆286Updated 8 months ago
- ☆340Updated 3 weeks ago
- [NeurIPS'25] KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems☆125Updated 2 months ago
- DeepSeek-V3/R1 inference performance simulator☆177Updated 10 months ago
- Fast OS-level support for GPU checkpoint and restore☆270Updated 4 months ago
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆455Updated this week
- A distributed framework for LLM agents☆446Updated 2 weeks ago
- High performance Transformer implementation in C++.☆148Updated last year
- [COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding☆276Updated last year