triton-inference-server / redis_cacheLinks
TRITONCACHE implementation of a Redis cache
☆16Updated last week
Alternatives and similar repositories for redis_cache
Users that are interested in redis_cache are comparing it to the libraries listed below
Sorting:
- Module, Model, and Tensor Serialization/Deserialization☆270Updated 2 months ago
- ☆39Updated this week
- Triton backend for managing the model state tensors automatically in sequence batcher☆18Updated last year
- Unified storage framework for the entire machine learning lifecycle☆155Updated last year
- MLFlow Deployment Plugin for Ray Serve☆46Updated 3 years ago
- xet client tech, used in huggingface_hub☆297Updated last week
- Ray-based Apache Beam runner☆41Updated 2 years ago
- TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and sup…☆395Updated this week
- The Triton backend for the ONNX Runtime.☆162Updated last week
- WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.☆17Updated 3 years ago
- ☆145Updated this week
- A collection of reproducible inference engine benchmarks☆34Updated 5 months ago
- ☆15Updated last month
- Python bindings for UCX☆140Updated last month
- TorchFix - a linter for PyTorch-using code with autofix support☆149Updated last month
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)☆28Updated 2 years ago
- ☆31Updated 6 months ago
- Benchmark suite for LLMs from Fireworks.ai☆83Updated 2 weeks ago
- Home for OctoML PyTorch Profiler☆114Updated 2 years ago
- Simple dependency injection framework for Python☆21Updated last year
- Distributed XGBoost on Ray☆149Updated last year
- A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed tra…☆18Updated 2 years ago
- The Triton backend for the PyTorch TorchScript models.☆160Updated this week
- The backend behind the LLM-Perf Leaderboard☆11Updated last year
- xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerat…☆146Updated this week
- A minimal shared memory object store design☆54Updated 8 years ago
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆384Updated 4 months ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆61Updated last month
- ClearML Fractional GPU - Run multiple containers on the same GPU with driver level memory limitation ✨ and compute time-slicing☆81Updated last year
- Some microbenchmarks and design docs before commencement☆12Updated 4 years ago