NVIDIA Inference Xfer Library (NIXL)
☆945Mar 20, 2026Updated last week
Alternatives and similar repositories for nixl
Users that are interested in nixl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Datacenter Scale Distributed Inference Serving Framework☆6,347Mar 20, 2026Updated last week
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆4,953Updated this week
- Perplexity GPU Kernels☆564Nov 7, 2025Updated 4 months ago
- KV cache store for distributed LLM inference☆400Nov 13, 2025Updated 4 months ago
- FlashInfer: Kernel Library for LLM Serving☆5,194Updated this week
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆1,273Aug 28, 2025Updated 6 months ago
- A lightweight design for computation-communication overlap.☆225Jan 20, 2026Updated 2 months ago
- Distributed Compiler based on Triton for Parallel Systems☆1,394Mar 11, 2026Updated 2 weeks ago
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆95Jan 16, 2026Updated 2 months ago
- Efficient and easy multi-instance LLM serving☆536Mar 12, 2026Updated 2 weeks ago
- Disaggregated serving system for Large Language Models (LLMs).☆792Apr 6, 2025Updated 11 months ago
- Supercharge Your LLM with the Fastest KV Cache Layer☆7,745Updated this week
- A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology☆1,355Mar 12, 2026Updated 2 weeks ago
- UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g…☆1,240Mar 20, 2026Updated last week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆490Mar 20, 2026Updated last week
- Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)☆1,599Updated this week
- Optimized primitives for collective multi-GPU communication☆4,562Updated this week
- A low-latency & high-throughput serving engine for LLMs☆486Jan 8, 2026Updated 2 months ago
- Dynamic Memory Management for Serving LLMs without PagedAttention☆466May 30, 2025Updated 9 months ago
- DeepEP: an efficient expert-parallel communication library☆9,053Feb 9, 2026Updated last month
- SGLang is a high-performance serving framework for large language models and multimodal models.☆24,829Updated this week
- A throughput-oriented high-performance serving framework for LLMs☆950Oct 29, 2025Updated 4 months ago
- ☆358Jan 28, 2026Updated last month
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆1,074Updated this week
- An experimental communicating attention kernel based on DeepEP.☆35Jul 29, 2025Updated 7 months ago
- Achieve state of the art inference performance with modern accelerators on Kubernetes☆2,657Updated this week
- Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, T…☆404Updated this week
- Expert Parallelism Load Balancer☆1,357Mar 24, 2025Updated last year
- A Rust reimplementation of genai-bench for benchmarking LLM serving systems at high concurrency with accurate timing and industry-standar…☆284Updated this week
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆172Feb 11, 2026Updated last month
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆145Mar 19, 2026Updated last week
- Analyze computation-communication overlap in V3/R1.☆1,149Mar 21, 2025Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆2,159Mar 19, 2026Updated last week
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…☆3,231Updated this week
- Materials for learning SGLang☆785Jan 5, 2026Updated 2 months ago
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆5,403Mar 20, 2026Updated last week
- ☆65Apr 26, 2025Updated 11 months ago
- DeepSeek-V3/R1 inference performance simulator☆189Mar 27, 2025Updated last year
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabili…☆3,958Updated this week