triton-inference-server / triton_distributedLinks

☆51

Alternatives and similar repositories for triton_distributed

Users that are interested in triton_distributed are comparing it to the libraries listed below

Sorting:

ai-dynamo / nixl
NVIDIA Inference Xfer Library (NIXL)
☆459Updated this week
NVIDIA / nvidia-resiliency-ext
NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …
☆187Updated this week
AlibabaPAI / llumnix
Efficient and easy multi-instance LLM serving
☆448Updated this week
google / nccl-fastsocket
NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.
☆117Updated last year
ppl-ai / pplx-kernels
Perplexity GPU Kernels
☆395Updated last month
microsoft / vattention
Dynamic Memory Management for Serving LLMs without PagedAttention
☆401Updated last month
microsoft / mscclpp
MSCCL++: A GPU-driven communication stack for scalable AI applications
☆385Updated this week
uccl-project / uccl
Ultra and Unified CCL
☆390Updated this week
microsoft / sarathi-serve
A low-latency & high-throughput serving engine for LLMs
☆388Updated last month
triton-inference-server / perf_analyzer
☆90Updated 2 weeks ago
yifuwang / symm-mem-recipes
☆94Updated 6 months ago
antgroup / glake
GLake: optimizing GPU memory management and IO transmission.
☆470Updated 3 months ago
coreweave / nccl-tests
NVIDIA NCCL Tests for Distributed Training
☆97Updated 3 weeks ago
Mellanox / nccl-rdma-sharp-plugins
RDMA and SHARP plugins for nccl library
☆198Updated 3 weeks ago
AlibabaPAI / torchacc
PyTorch distributed training acceleration framework
☆51Updated 5 months ago
NVIDIA / cuda-checkpoint
CUDA checkpoint and restore utility
☆346Updated 5 months ago
bytedance / InfiniStore
KV cache store for distributed LLM inference
☆288Updated last month
infinigence / Semi-PD
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆95Updated last month
NVIDIA / nvbandwidth
A tool for bandwidth measurements on NVIDIA GPUs.
☆474Updated 3 months ago
sgl-project / genai-bench
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆161Updated last week
bytedance / ByteMLPerf
AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…
☆251Updated 2 weeks ago
microsoft / NPKit
NCCL Profiling Kit
☆139Updated last year
microsoft / msccl
Microsoft Collective Communication Library
☆351Updated last year
facebookresearch / HolisticTraceAnalysis
A library to analyze PyTorch traces.
☆391Updated last week
HabanaAI / vllm-fork
A high-throughput and memory-efficient inference and serving engine for LLMs
☆77Updated this week
triton-inference-server / core
The core library and APIs implementing the Triton Inference Server.
☆138Updated this week
NVIDIA / DCGM
NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
☆542Updated 2 months ago
sail-sg / zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
☆406Updated 2 months ago
alibaba / TePDist
TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.
☆94Updated 2 years ago
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆146Updated 3 weeks ago