triton-inference-server / triton_distributed
☆49Updated 2 weeks ago
Alternatives and similar repositories for triton_distributed:
Users that are interested in triton_distributed are comparing it to the libraries listed below
- NVIDIA Inference Xfer Library (NIXL)☆191Updated this week
- Efficient and easy multi-instance LLM serving☆348Updated this week
- NVIDIA NCCL Tests for Distributed Training☆85Updated last week
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆315Updated this week
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆109Updated this week
- A low-latency & high-throughput serving engine for LLMs☆327Updated last month
- CUDA checkpoint and restore utility☆315Updated 2 months ago
- A tool for bandwidth measurements on NVIDIA GPUs.☆392Updated last month
- ☆51Updated this week
- RDMA and SHARP plugins for nccl library☆184Updated this week
- NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.☆116Updated last year
- Dynamic Memory Management for Serving LLMs without PagedAttention☆326Updated this week
- Hooked CUDA-related dynamic libraries by using automated code generation tools.☆150Updated last year
- ☆57Updated 3 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆62Updated this week
- NCCL Profiling Kit☆128Updated 8 months ago
- Experimental projects related to TensorRT☆94Updated this week
- A library to analyze PyTorch traces.☆350Updated last week
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…☆462Updated 2 weeks ago
- Microsoft Collective Communication Library☆60Updated 4 months ago
- Microsoft Collective Communication Library☆343Updated last year
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…☆233Updated this week
- The core library and APIs implementing the Triton Inference Server.☆123Updated this week
- HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of…☆141Updated 3 weeks ago
- ☆296Updated 7 months ago
- Disaggregated serving system for Large Language Models (LLMs).☆507Updated 7 months ago
- Common source, scripts and utilities for creating Triton backends.☆311Updated this week
- Applied AI experiments and examples for PyTorch☆250Updated last week
- TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.☆92Updated last year
- GLake: optimizing GPU memory management and IO transmission.☆449Updated this week