triton-inference-server / redis_cache
TRITONCACHE implementation of a Redis cache
☆12Updated this week
Related projects ⓘ
Alternatives and complementary repositories for redis_cache
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)☆27Updated last year
- WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.☆18Updated 2 years ago
- MLFlow Deployment Plugin for Ray Serve☆42Updated 2 years ago
- PostText is a QA system for querying your text data. When appropriate structured views are in place, PostText is good at answering querie…☆31Updated last year
- A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed tra…☆18Updated last year
- This repository contains statistics about the AI Infrastructure products.☆18Updated 4 months ago
- Make triton easier☆41Updated 5 months ago
- Distributed ML Optimizer☆30Updated 3 years ago
- Some microbenchmarks and design docs before commencement☆12Updated 3 years ago
- Sentence Embedding as a Service☆14Updated last year
- First token cutoff sampling inference example☆28Updated 10 months ago
- Official code for "Binary embedding based retrieval at Tencent"☆42Updated 8 months ago
- vLLM adapter for a TGIS-compatible gRPC server.☆10Updated this week
- Triton backend for managing the model state tensors automatically in sequence batcher☆13Updated 9 months ago
- Vector Database with support for late interaction and token level embeddings.☆54Updated last month
- Code repository for the paper - "AdANNS: A Framework for Adaptive Semantic Search"☆60Updated last year
- Open sourced backend for Martian's LLM Inference Provider Leaderboard☆17Updated 3 months ago
- Lightning Fast: Faiss CPU + Onnx Quantized Multilingual Embedding Model☆22Updated 2 months ago
- ☆47Updated this week
- Simple dependency injection framework for Python☆20Updated 6 months ago
- ☆20Updated this week
- Distributed Approximate Nearest Neighbors Database https://anndb.com☆35Updated 3 years ago
- A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-…☆66Updated last year
- Core Utilities for NVIDIA Merlin☆19Updated 3 months ago
- Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.☆78Updated this week
- ☆15Updated last year
- ☆18Updated this week
- ☆36Updated 2 years ago
- The Triton backend for the PyTorch TorchScript models.☆127Updated this week
- Cortex-compatible model server for Python and TensorFlow☆17Updated last year