ai-dynamo / aiconfiguratorLinks
Offline optimization of your disaggregated Dynamo graph
☆88Updated this week
Alternatives and similar repositories for aiconfigurator
Users that are interested in aiconfigurator are comparing it to the libraries listed below
Sorting:
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆114Updated 5 months ago
 - A lightweight design for computation-communication overlap.☆182Updated 3 weeks ago
 - Efficient and easy multi-instance LLM serving☆504Updated 2 months ago
 - NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆142Updated last month
 - NCCL Profiling Kit☆145Updated last year
 - A low-latency & high-throughput serving engine for LLMs☆431Updated 2 weeks ago
 - NVIDIA Inference Xfer Library (NIXL)☆688Updated this week
 - NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.☆121Updated last year
 - Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆62Updated last year
 - DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆65Updated last week
 - Perplexity GPU Kernels☆513Updated last week
 - ☆46Updated 10 months ago
 - Fast and memory-efficient exact attention☆96Updated 2 weeks ago
 - Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆225Updated last week
 - AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…☆265Updated 2 months ago
 - NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆228Updated last week
 - ☆67Updated 9 months ago
 - DeepSeek-V3/R1 inference performance simulator☆170Updated 7 months ago
 - ☆144Updated 10 months ago
 - MSCCL++: A GPU-driven communication stack for scalable AI applications☆427Updated last week
 - KV cache store for distributed LLM inference☆351Updated last month
 - RDMA and SHARP plugins for nccl library☆211Updated last week
 - Microsoft Collective Communication Library☆67Updated 11 months ago
 - ☆101Updated last year
 - ☆90Updated 7 months ago
 - ☆74Updated 2 weeks ago
 - Dynamic Memory Management for Serving LLMs without PagedAttention☆432Updated 5 months ago
 - Stateful LLM Serving☆87Updated 7 months ago
 - NVIDIA NCCL Tests for Distributed Training☆118Updated this week
 - SpotServe: Serving Generative Large Language Models on Preemptible Instances☆131Updated last year