triton-inference-server / stateful_backendLinks

Triton backend for managing the model state tensors automatically in sequence batcher

☆17

Alternatives and similar repositories for stateful_backend

Users that are interested in stateful_backend are comparing it to the libraries listed below

Sorting:

triton-inference-server / onnxruntime_backend
The Triton backend for the ONNX Runtime.
☆156Updated this week
triton-inference-server / pytorch_backend
The Triton backend for the PyTorch TorchScript models.
☆158Updated this week
microsoft / batch-inference
Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.
☆102Updated 11 months ago
triton-inference-server / model_navigator
Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.
☆210Updated 3 months ago
triton-inference-server / tensorflow_backend
The Triton backend for TensorFlow.
☆52Updated last month
tensorchord / inference-benchmark
Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)
☆28Updated 2 years ago
sdpython / onnxcustom
Tutorial on how to convert machine learned models into ONNX
☆16Updated 2 years ago
hamelsmu / llama-inference
experiments with inference on llama
☆104Updated last year
triton-inference-server / fil_backend
FIL backend for the Triton Inference Server
☆81Updated last week
triton-inference-server / openvino_backend
OpenVINO backend for Triton.
☆32Updated this week
triton-inference-server / redis_cache
TRITONCACHE implementation of a Redis cache
☆14Updated 2 weeks ago
lessw2020 / transformer_central
Various transformers for FSDP research
☆37Updated 2 years ago
determined-ai / determined-examples
Example ML projects that use the Determined library.
☆32Updated 10 months ago
triton-inference-server / triton_cli
Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…
☆66Updated this week
Narsil / bloomserver
☆39Updated 2 years ago
triton-inference-server / common
Common source, scripts and utilities shared across all Triton repositories.
☆75Updated this week
AniZpZ / smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
☆11Updated last year
wangkuiyi / huggingface-tokenizer-in-cxx
☆68Updated 2 years ago
fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆76Updated last week
premAI-io / benchmarks
🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
☆137Updated last year
triton-inference-server / backend
Common source, scripts and utilities for creating Triton backends.
☆337Updated this week
pisa-engine / BMP
Faster Learned Sparse Retrieval with Block-Max Pruning. ACM SIGIR 2024.
☆31Updated last week
google / space
Unified storage framework for the entire machine learning lifecycle
☆156Updated last year
nod-ai / transformer-benchmarks
benchmarking some transformer deployments
☆26Updated 2 years ago
JulesBelveze / bert-squeeze
🛠️ Tools for Transformers compression using PyTorch Lightning ⚡
☆84Updated 8 months ago
Snowflake-Labs / vllm
☆15Updated 4 months ago
triton-inference-server / core
The core library and APIs implementing the Triton Inference Server.
☆145Updated last week
IlyasMoutawwakil / py-txi
A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.
☆33Updated 3 months ago
ashvardanian / jaccard-index
Optimizing bit-level Jaccard Index and Population Counts for large-scale quantized Vector Search via Harley-Seal CSA and Lookup Tables
☆20Updated 2 months ago
npuichigo / openai_trtllm
OpenAI compatible API for TensorRT LLM triton backend
☆209Updated last year