triton-inference-server / stateful_backend
Triton backend for managing the model state tensors automatically in sequence batcher
☆14Updated 11 months ago
Alternatives and similar repositories for stateful_backend:
Users that are interested in stateful_backend are comparing it to the libraries listed below
- The Triton backend for the ONNX Runtime.☆136Updated this week
- Cortex-compatible model server for Python and TensorFlow☆17Updated 2 years ago
- TRITONCACHE implementation of a Redis cache☆13Updated this week
- MLFlow Deployment Plugin for Ray Serve☆43Updated 2 years ago
- ☆54Updated last year
- A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed tra…☆18Updated 2 years ago
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆52Updated this week
- Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.☆193Updated this week
- Distributed ML Optimizer☆30Updated 3 years ago
- benchmarking some transformer deployments☆26Updated last year
- The Triton backend for TensorFlow.☆45Updated this week
- Common source, scripts and utilities shared across all Triton repositories.☆65Updated this week
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)☆27Updated last year
- OpenVINO backend for Triton.☆30Updated this week
- Fast and vectorizable algorithms for searching in a vector of sorted floating point numbers☆127Updated 3 weeks ago
- Some microbenchmarks and design docs before commencement☆12Updated 3 years ago
- Minimal example of using a traced huggingface transformers model with libtorch☆35Updated 4 years ago
- Implementation of "Efficient Multi-vector Dense Retrieval with Bit Vectors", ECIR 2024☆60Updated 3 months ago
- AIBench, a tool for comparing and evaluating AI serving solutions. forked from [tsbs](https://github.com/timescale/tsbs) and adapted to A…☆20Updated 4 months ago
- Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.☆86Updated this week
- The Triton backend for the PyTorch TorchScript models.☆139Updated this week
- 🛠️ Tools for Transformers compression using PyTorch Lightning ⚡☆81Updated 2 months ago
- The collection of bulding blocks building fine-tunable metric learning models☆32Updated last week
- WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.☆18Updated 2 years ago
- A production-ready, scalable Indexer for the Jina neural search framework, based on HNSW and PSQL☆29Updated 2 years ago
- The core library and APIs implementing the Triton Inference Server.☆114Updated this week
- Open sourced backend for Martian's LLM Inference Provider Leaderboard☆17Updated 5 months ago
- Benchmarks to capture important workloads.☆29Updated this week
- Go library that provides easy-to-use interfaces and tools for TensorFlow users, in particular allowing to train existing TF models on .ta…☆14Updated 10 months ago
- This repository contains statistics about the AI Infrastructure products.☆18Updated 6 months ago