triton-inference-server / serverLinks

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

☆10,060

Alternatives and similar repositories for server

Users that are interested in server are comparing it to the libraries listed below

Sorting:

pytorch / serve
Serve, optimize and scale PyTorch models in production
☆4,353Updated 3 months ago
NVIDIA / FasterTransformer
Transformer related optimization, including BERT, GPT
☆6,355Updated last year
NVIDIA / TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source compone…
☆12,403Updated 2 weeks ago
NVIDIA / Megatron-LM
Ongoing research training transformer models at scale
☆14,301Updated this week
pytorch / TensorRT
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
☆2,894Updated this week
triton-lang / triton
Development repository for the Triton language and compiler
☆17,668Updated this week
NVIDIA / DALI
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep lear…
☆5,562Updated this week
NVIDIA / nccl
Optimized primitives for collective multi-GPU communication
☆4,258Updated 2 weeks ago
apache / tvm
Open Machine Learning Compiler Framework
☆12,835Updated last week
NVIDIA / TensorRT-LLM
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizat…
☆12,203Updated last week
Dao-AILab / flash-attention
Fast and memory-efficient exact attention
☆20,669Updated last week
bitsandbytes-foundation / bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
☆7,767Updated last week
NVIDIA / TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…
☆2,954Updated this week
sgl-project / sglang
SGLang is a fast serving framework for large language models and vision language models.
☆20,402Updated this week
onnx / onnx-tensorrt
ONNX-TensorRT: TensorRT backend for ONNX
☆3,172Updated 3 weeks ago
flashinfer-ai / flashinfer
FlashInfer: Kernel Library for LLM Serving
☆4,136Updated this week
onnx / onnx
Open standard for machine learning interoperability
☆19,933Updated this week
huggingface / optimum
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization…
☆3,188Updated 2 weeks ago
daquexian / onnx-simplifier
Simplify your onnx model
☆4,232Updated 3 months ago
NVIDIA / cutlass
CUDA Templates and Python DSLs for High-Performance Linear Algebra
☆8,828Updated last week
triton-inference-server / tutorials
This repository contains tutorials and examples for Triton Inference Server
☆801Updated 2 weeks ago
facebookresearch / fairscale
PyTorch extensions for high performance and large scale training.
☆3,386Updated 7 months ago
CVCUDA / CV-CUDA
CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.
☆2,606Updated 2 weeks ago
facebookincubator / AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…
☆4,695Updated last month
intel / neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, …
☆2,533Updated this week
NVIDIA / apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
☆8,854Updated last week
horovod / horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
☆14,628Updated 3 weeks ago
huggingface / text-generation-inference
Large Language Model Text Generation Inference
☆10,664Updated last week
triton-inference-server / client
Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.
☆663Updated 2 weeks ago
microsoft / onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
☆18,442Updated last week