triton-inference-server / redis_cache
TRITONCACHE implementation of a Redis cache
☆13Updated 3 weeks ago
Alternatives and similar repositories for redis_cache:
Users that are interested in redis_cache are comparing it to the libraries listed below
- This repository contains statistics about the AI Infrastructure products.☆18Updated last month
- Triton backend for managing the model state tensors automatically in sequence batcher☆16Updated last year
- A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed tra…☆18Updated 2 years ago
- First token cutoff sampling inference example☆29Updated last year
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)☆28Updated last year
- WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.☆18Updated 2 years ago
- ☆22Updated this week
- xet client tech, used in huggingface_hub☆61Updated this week
- Simple dependency injection framework for Python☆20Updated 10 months ago
- The official evaluation suite and dynamic data release for MixEval.☆11Updated 6 months ago
- vLLM adapter for a TGIS-compatible gRPC server.☆25Updated this week
- Creating Generative AI Apps which work☆17Updated 8 months ago
- ☆56Updated last week
- PostText is a QA system for querying your text data. When appropriate structured views are in place, PostText is good at answering querie…☆31Updated last year
- The backend behind the LLM-Perf Leaderboard☆10Updated 10 months ago
- ☆14Updated last month
- Make triton easier☆47Updated 9 months ago
- Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.☆91Updated this week
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"☆16Updated 4 months ago
- MLFlow Deployment Plugin for Ray Serve☆44Updated 2 years ago
- ☆37Updated 2 years ago
- Code repository for the paper - "AdANNS: A Framework for Adaptive Semantic Search"☆63Updated last year
- Tutorial to get started with SkyPilot!☆57Updated 10 months ago
- Vector Database with support for late interaction and token level embeddings.☆53Updated 6 months ago
- Sentence Embedding as a Service☆15Updated last year
- A minimalistic C++ Jinja templating engine for LLM chat templates☆128Updated this week
- Distributed ML Optimizer☆30Updated 3 years ago
- A library for simplifying fine tuning with multi gpu setups in the Huggingface ecosystem.☆16Updated 5 months ago
- ☆15Updated last year
- ☆12Updated last year