triton-inference-server / redis_cacheLinks
TRITONCACHE implementation of a Redis cache
☆16Updated 2 weeks ago
Alternatives and similar repositories for redis_cache
Users that are interested in redis_cache are comparing it to the libraries listed below
Sorting:
- ☆43Updated this week
- Module, Model, and Tensor Serialization/Deserialization☆279Updated 4 months ago
- Triton backend for managing the model state tensors automatically in sequence batcher☆18Updated last year
- TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and sup…☆407Updated last week
- A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed tra…☆18Updated 2 years ago
- Benchmark suite for LLMs from Fireworks.ai☆84Updated last month
- Home for OctoML PyTorch Profiler☆114Updated 2 years ago
- The Triton backend for the PyTorch TorchScript models.☆167Updated this week
- The Triton backend for the ONNX Runtime.☆170Updated 2 weeks ago
- A collection of reproducible inference engine benchmarks☆38Updated 8 months ago
- Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.☆119Updated last week
- Some microbenchmarks and design docs before commencement☆12Updated 4 years ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆62Updated 3 months ago
- benchmarking some transformer deployments☆26Updated last week
- MLFlow Deployment Plugin for Ray Serve☆46Updated 3 years ago
- Distributed ML Optimizer☆34Updated 4 years ago
- WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.☆17Updated 3 years ago
- ☆31Updated 8 months ago
- TorchFix - a linter for PyTorch-using code with autofix support☆152Updated 4 months ago
- xet client tech, used in huggingface_hub☆356Updated this week
- Code repository for the paper - "AdANNS: A Framework for Adaptive Semantic Search"☆65Updated 2 years ago
- Core Utilities for NVIDIA Merlin☆19Updated last year
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆162Updated this week
- The Triton backend for TensorFlow.☆55Updated last month
- Merlin Systems provides tools for combining recommendation models with other elements of production recommender systems (like feature sto…☆94Updated last year
- Simple dependency injection framework for Python☆21Updated last year
- ☆16Updated last month
- Python bindings for UCX☆140Updated 3 months ago
- An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.☆48Updated this week
- MLPerf™ logging library☆37Updated last week