triton-inference-server / stateful_backend
Triton backend for managing the model state tensors automatically in sequence batcher
☆15Updated last year
Alternatives and similar repositories for stateful_backend:
Users that are interested in stateful_backend are comparing it to the libraries listed below
- The Triton backend for the ONNX Runtime.☆140Updated 2 weeks ago
- TRITONCACHE implementation of a Redis cache☆13Updated 2 weeks ago
- The Triton backend for TensorRT.☆70Updated 2 weeks ago
- ☆60Updated 2 years ago
- A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed tra…☆18Updated 2 years ago
- ☆14Updated last month
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆61Updated last week
- Distributed ML Optimizer☆30Updated 3 years ago
- OpenVINO backend for Triton.☆31Updated 2 weeks ago
- MLFlow Deployment Plugin for Ray Serve☆44Updated 2 years ago
- Common source, scripts and utilities shared across all Triton repositories.☆69Updated this week
- Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.☆199Updated 2 months ago
- Tutorial on how to convert machine learned models into ONNX☆16Updated 2 years ago
- Distributed preprocessing and data loading for language datasets☆39Updated 11 months ago
- The Triton backend for the PyTorch TorchScript models.☆144Updated 2 weeks ago
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)☆28Updated last year
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆11Updated last year
- The core library and APIs implementing the Triton Inference Server.☆123Updated last week
- Some microbenchmarks and design docs before commencement☆12Updated 4 years ago
- The collection of bulding blocks building fine-tunable metric learning models☆32Updated 2 months ago
- A boilerplate to use multiprocessing for your gRPC server in your Python project☆25Updated 3 years ago
- Make triton easier☆47Updated 9 months ago
- vLLM adapter for a TGIS-compatible gRPC server.☆25Updated this week
- Benchmarks to capture important workloads.☆30Updated 2 months ago
- benchmarking some transformer deployments☆26Updated 2 years ago
- Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and…☆19Updated last week
- NASRec Weight Sharing Neural Architecture Search for Recommender Systems☆29Updated last year
- Open sourced backend for Martian's LLM Inference Provider Leaderboard☆17Updated 7 months ago
- ☆21Updated 3 weeks ago
- Standalone implementation of the CUDA-accelerated WFST Decoder available in Riva☆85Updated last month