triton-inference-server / redis_cacheLinks
TRITONCACHE implementation of a Redis cache
☆15Updated 2 weeks ago
Alternatives and similar repositories for redis_cache
Users that are interested in redis_cache are comparing it to the libraries listed below
Sorting:
- TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and sup…☆391Updated this week
- Module, Model, and Tensor Serialization/Deserialization☆267Updated last month
- Triton backend for managing the model state tensors automatically in sequence batcher☆18Updated last year
- xet client tech, used in huggingface_hub☆236Updated this week
- ☆40Updated this week
- ☆15Updated 2 weeks ago
- ☆146Updated this week
- Unified storage framework for the entire machine learning lifecycle☆155Updated last year
- First token cutoff sampling inference example☆31Updated last year
- A collection of reproducible inference engine benchmarks☆33Updated 5 months ago
- MLFlow Deployment Plugin for Ray Serve☆46Updated 3 years ago
- TorchFix - a linter for PyTorch-using code with autofix support☆148Updated last month
- Home for OctoML PyTorch Profiler☆114Updated 2 years ago
- The Triton backend for the ONNX Runtime.☆162Updated 2 weeks ago
- PostText is a QA system for querying your text data. When appropriate structured views are in place, PostText is good at answering querie…☆32Updated 2 years ago
- WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.☆18Updated 3 years ago
- Benchmark suite for LLMs from Fireworks.ai☆83Updated this week
- The Triton backend for the PyTorch TorchScript models.☆159Updated 2 weeks ago
- ☆31Updated 5 months ago
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆404Updated last week
- IBM development fork of https://github.com/huggingface/text-generation-inference☆61Updated last week
- Simple dependency injection framework for Python☆21Updated last year
- Ray-based Apache Beam runner☆41Updated 2 years ago
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆377Updated 3 months ago
- A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed tra…☆18Updated 2 years ago
- FIL backend for the Triton Inference Server☆82Updated 2 weeks ago
- Repository for open inference protocol specification☆59Updated 4 months ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆161Updated last week
- Some microbenchmarks and design docs before commencement☆12Updated 4 years ago
- ☆25Updated this week