triton-inference-server / stateful_backend
Triton backend for managing the model state tensors automatically in sequence batcher
☆17Updated last year
Alternatives and similar repositories for stateful_backend:
Users that are interested in stateful_backend are comparing it to the libraries listed below
- The Triton backend for the ONNX Runtime.☆140Updated last week
- TRITONCACHE implementation of a Redis cache☆13Updated last week
- Tutorial on how to convert machine learned models into ONNX☆16Updated 2 years ago
- The Triton backend for TensorRT.☆73Updated this week
- IBM development fork of https://github.com/huggingface/text-generation-inference☆60Updated 4 months ago
- The core library and APIs implementing the Triton Inference Server.☆124Updated last week
- Make triton easier☆47Updated 10 months ago
- ☆39Updated 2 years ago
- Cortex-compatible model server for Python and TensorFlow☆17Updated 2 years ago
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆62Updated last month
- ☆15Updated 3 weeks ago
- ☆65Updated 2 years ago
- Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and…☆19Updated last month
- PostText is a QA system for querying your text data. When appropriate structured views are in place, PostText is good at answering querie…☆32Updated last year
- vLLM adapter for a TGIS-compatible gRPC server.☆26Updated this week
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆11Updated last year
- XTR: Rethinking the Role of Token Retrieval in Multi-Vector Retrieval☆50Updated 10 months ago
- The Triton backend for the PyTorch TorchScript models.☆146Updated this week
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- Some microbenchmarks and design docs before commencement☆12Updated 4 years ago
- Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.☆199Updated 3 months ago
- The Triton backend for TensorFlow.☆51Updated last week
- Vector Database with support for late interaction and token level embeddings.☆54Updated 6 months ago
- FIL backend for the Triton Inference Server☆77Updated 2 weeks ago
- Distributed ML Optimizer☆32Updated 3 years ago
- Distributed preprocessing and data loading for language datasets☆39Updated last year
- A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed tra…☆18Updated 2 years ago
- Sentence Embedding as a Service☆15Updated last year
- Common source, scripts and utilities shared across all Triton repositories.☆69Updated last week
- Code repository for the paper - "AdANNS: A Framework for Adaptive Semantic Search"☆64Updated last year