triton-inference-server / redis_cache
TRITONCACHE implementation of a Redis cache
☆13Updated this week
Alternatives and similar repositories for redis_cache
Users that are interested in redis_cache are comparing it to the libraries listed below
Sorting:
- xet client tech, used in huggingface_hub☆95Updated this week
- A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed tra…☆18Updated 2 years ago
- ☆15Updated last month
- WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.☆18Updated 2 years ago
- The backend behind the LLM-Perf Leaderboard☆10Updated last year
- A collection of reproducible inference engine benchmarks☆30Updated 3 weeks ago
- ☆39Updated 2 years ago
- Vector Database with support for late interaction and token level embeddings.☆54Updated 7 months ago
- Simple dependency injection framework for Python☆21Updated last year
- vLLM adapter for a TGIS-compatible gRPC server.☆27Updated this week
- Make triton easier☆47Updated 11 months ago
- ☆32Updated this week
- Some microbenchmarks and design docs before commencement☆12Updated 4 years ago
- 🤝 Trade any tensors over the network☆30Updated last year
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)☆28Updated last year
- ☆21Updated 2 months ago
- First token cutoff sampling inference example☆30Updated last year
- PostText is a QA system for querying your text data. When appropriate structured views are in place, PostText is good at answering querie…☆32Updated last year
- Distributed ML Optimizer☆32Updated 3 years ago
- Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and…☆24Updated last month
- Sentence Embedding as a Service☆15Updated last year
- Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.☆72Updated this week
- Triton backend for managing the model state tensors automatically in sequence batcher☆17Updated last year
- IBM development fork of https://github.com/huggingface/text-generation-inference☆60Updated last week
- Simple high-throughput inference library☆46Updated this week
- ☆13Updated last year
- MLFlow Deployment Plugin for Ray Serve☆44Updated 3 years ago
- This repository contains statistics about the AI Infrastructure products.☆18Updated 2 months ago
- Cortex-compatible model server for Python and TensorFlow☆17Updated 2 years ago
- Optimizing bit-level Jaccard Index and Population Counts for large-scale quantized Vector Search via Harley-Seal CSA and Lookup Tables☆18Updated this week