triton-inference-server / triton_distributedLinks
☆50Updated 3 months ago
Alternatives and similar repositories for triton_distributed
Users that are interested in triton_distributed are comparing it to the libraries listed below
Sorting:
- NVIDIA Inference Xfer Library (NIXL)☆422Updated this week
- Efficient and easy multi-instance LLM serving☆437Updated this week
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆179Updated 2 weeks ago
- A low-latency & high-throughput serving engine for LLMs☆380Updated 3 weeks ago
- Perplexity GPU Kernels☆375Updated 2 weeks ago
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆87Updated last month
- KV cache store for distributed LLM inference☆269Updated 2 weeks ago
- NVIDIA NCCL Tests for Distributed Training☆97Updated last week
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆379Updated this week
- Dynamic Memory Management for Serving LLMs without PagedAttention☆397Updated 3 weeks ago
- ☆90Updated 5 months ago
- Ultra and Unified CCL☆165Updated this week
- ☆26Updated 3 months ago
- NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.☆117Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆76Updated this week
- DeepSeek-V3/R1 inference performance simulator☆149Updated 2 months ago
- NCCL Profiling Kit☆138Updated 11 months ago
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆59Updated last year
- PyTorch distributed training acceleration framework☆49Updated 4 months ago
- A tool for bandwidth measurements on NVIDIA GPUs.☆464Updated 2 months ago
- ☆81Updated last week
- Experimental projects related to TensorRT☆105Updated last week
- HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of…☆152Updated this week
- Microsoft Collective Communication Library☆350Updated last year
- ☆194Updated last month
- A library to analyze PyTorch traces.☆391Updated last week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆337Updated this week
- Common source, scripts and utilities for creating Triton backends.☆328Updated last week
- Zero Bubble Pipeline Parallelism☆398Updated last month
- A collection of memory efficient attention operators implemented in the Triton language.☆272Updated last year