triton-inference-server / stateful_backendLinks
Triton backend for managing the model state tensors automatically in sequence batcher
☆17Updated last year
Alternatives and similar repositories for stateful_backend
Users that are interested in stateful_backend are comparing it to the libraries listed below
Sorting:
- The Triton backend for the ONNX Runtime.☆153Updated last week
- Tutorial on how to convert machine learned models into ONNX☆16Updated 2 years ago
- Make triton easier☆46Updated last year
- 🤝 Trade any tensors over the network☆30Updated last year
- Cortex-compatible model server for Python and TensorFlow☆17Updated 2 years ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆33Updated last month
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆64Updated 2 weeks ago
- Trace LLM calls (and others) and visualize them in WandB, as interactive SVG or using a streaming local webapp☆14Updated 4 months ago
- TRITONCACHE implementation of a Redis cache☆14Updated 2 weeks ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆58Updated last month
- The Triton backend for TensorRT.☆77Updated last week
- GGML implementation of BERT model with Python bindings and quantization.☆55Updated last year
- ☆67Updated 2 years ago
- benchmarking some transformer deployments☆26Updated 2 years ago
- A collection of reproducible inference engine benchmarks☆31Updated 2 months ago
- ☆39Updated 2 years ago
- OpenVINO backend for Triton.☆32Updated last week
- Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.