triton-inference-server / redis_cacheLinks
TRITONCACHE implementation of a Redis cache
☆16Updated 3 weeks ago
Alternatives and similar repositories for redis_cache
Users that are interested in redis_cache are comparing it to the libraries listed below
Sorting:
- WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.☆17Updated 3 years ago
- ☆44Updated last week
- Module, Model, and Tensor Serialization/Deserialization☆286Updated 5 months ago
- MLFlow Deployment Plugin for Ray Serve☆46Updated 3 years ago
- TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and sup…☆412Updated this week
- Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.☆124Updated this week
- ☆60Updated this week
- xet client tech, used in huggingface_hub☆403Updated this week
- The Triton backend for the ONNX Runtime.☆172Updated this week
- Home for OctoML PyTorch Profiler☆113Updated 2 years ago
- ☆152Updated last month
- Some microbenchmarks and design docs before commencement☆12Updated 5 years ago
- A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed tra…☆18Updated 3 years ago
- The Triton backend for the PyTorch TorchScript models.☆173Updated this week
- Ray-based Apache Beam runner☆42Updated 2 years ago
- A top-like tool for monitoring GPUs in a cluster☆84Updated last year
- Python bindings for UCX☆139Updated 4 months ago
- ☆31Updated 9 months ago
- xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerat…☆162Updated last week
- Core Utilities for NVIDIA Merlin☆19Updated last year
- MLPerf™ logging library☆38Updated last month
- Triton backend for managing the model state tensors automatically in sequence batcher☆17Updated last year
- Unified storage framework for the entire machine learning lifecycle☆155Updated last year
- IBM development fork of https://github.com/huggingface/text-generation-inference☆63Updated 4 months ago
- First token cutoff sampling inference example☆30Updated 2 years ago
- Tune efficiently any LLM model from HuggingFace using distributed training (multiple GPU) and DeepSpeed. Uses Ray AIR to orchestrate the …☆60Updated 2 years ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆164Updated 3 weeks ago
- An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.☆52Updated this week
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆404Updated last month
- TorchFix - a linter for PyTorch-using code with autofix support☆152Updated 5 months ago