triton-inference-server / redis_cache
TRITONCACHE implementation of a Redis cache
☆13Updated last week
Alternatives and similar repositories for redis_cache:
Users that are interested in redis_cache are comparing it to the libraries listed below
- WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.☆18Updated 2 years ago
- Simple dependency injection framework for Python☆20Updated 9 months ago
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)☆28Updated last year
- Creating Generative AI Apps which work☆16Updated 7 months ago
- Make triton easier☆44Updated 8 months ago
- The backend behind the LLM-Perf Leaderboard☆10Updated 9 months ago
- A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed tra…☆18Updated 2 years ago
- vLLM adapter for a TGIS-compatible gRPC server.☆21Updated this week
- ☆53Updated last month
- Some microbenchmarks and design docs before commencement☆12Updated 4 years ago
- Triton backend for managing the model state tensors automatically in sequence batcher☆14Updated last year
- This repository contains statistics about the AI Infrastructure products.☆18Updated 3 weeks ago
- MLFlow Deployment Plugin for Ray Serve☆43Updated 2 years ago
- Core Utilities for NVIDIA Merlin☆19Updated 6 months ago
- ☆37Updated 2 years ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆59Updated 2 months ago
- Sentence Embedding as a Service☆14Updated last year
- First token cutoff sampling inference example☆29Updated last year
- The driver for LMCache core to run in vLLM☆29Updated 2 weeks ago
- Benchmark suite for LLMs from Fireworks.ai☆66Updated last week
- Vector Database with support for late interaction and token level embeddings.☆52Updated 4 months ago
- Open sourced backend for Martian's LLM Inference Provider Leaderboard☆17Updated 6 months ago
- Code repository for the paper - "AdANNS: A Framework for Adaptive Semantic Search"☆62Updated last year
- Super-fast Structured Outputs☆114Updated this week
- ☆22Updated this week
- Lightning Fast: Faiss CPU + Onnx Quantized Multilingual Embedding Model☆23Updated 5 months ago
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆29Updated 4 months ago
- Machine Learning Inference Graph Spec☆21Updated 5 years ago
- Self-host LLMs with vLLM and BentoML☆87Updated this week