uccl-project / ucclLinks
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
☆1,157Updated this week
Alternatives and similar repositories for uccl
Users that are interested in uccl are comparing it to the libraries listed below
Sorting:
- Extending eBPF Programmability and Observability to GPUs (merged into https://github.com/eunomia-bpf/bpftime)☆284Updated last month
- ☆790Updated last week
- Unified KV Cache Compression Methods for Auto-Regressive Models☆1,295Updated last year
- An acceleration library that supports arbitrary bit-width combinatorial quantization operations☆238Updated last year
- A highly optimized LLM inference acceleration engine for Llama and its variants.☆905Updated 5 months ago
- TVM Documentation in Chinese Simplified / TVM 中文文档☆2,975Updated last month
- CXL remote offloading data movement aware compiler☆71Updated this week
- [Neurips 2025] R-KV: Redundancy-aware KV Cache Compression for Reasoning Models☆1,163Updated 2 months ago
- ☆981Updated this week
- Expert Kit is an efficient foundation of Expert Parallelism (EP) for MoE model Inference on heterogenous hardware☆61Updated 2 months ago
- [ICLR 2025🔥] SVD-LLM & [NAACL 2025🔥] SVD-LLM V2☆271Updated 4 months ago
- FlagPerf is an open-source software platform for benchmarking AI chips.☆358Updated last month
- ☆138Updated 5 months ago
- KV cache store for distributed LLM inference☆378Updated last month
- Some Hardware Architectures for GEMM☆283Updated 7 months ago
- Heterogeneous Containerization of Large Language Model Apps☆109Updated 5 months ago
- NVIDIA Inference Xfer Library (NIXL)☆788Updated this week
- [NeurIPS'25] KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems☆112Updated 2 months ago
- GLake: optimizing GPU memory management and IO transmission.☆494Updated 9 months ago
- PTX on XPUs☆115Updated this week
- [ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference☆278Updated 8 months ago
- Efficient and easy multi-instance LLM serving☆520Updated 4 months ago
- MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction☆94Updated last year
- The official implementation of MARS: Unleashing the Power of Variance Reduction for Training Large Models☆715Updated 2 months ago
- Disaggregated serving system for Large Language Models (LLMs).☆761Updated 9 months ago
- ☆337Updated this week
- [ICLR 2025] BitStack: Any-Size Compression of Large Language Models in Variable Memory Environments☆39Updated 10 months ago
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆449Updated this week
- YiRage (Yield Revolutionary AGile Engine) - Multi-Backend LLM Inference Optimization. Extends Mirage with comprehensive support for CUDA,…☆37Updated last week
- Comprehensive open-source library of AI research and engineering skills for any AI model. Package the skills and your claude code/codex/g…☆497Updated last week