llm-d/llm-d-kv-cache

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/llm-d/llm-d-kv-cache)

llm-d / llm-d-kv-cache

Distributed KV cache scheduling & offloading libraries

☆164

Alternatives and similar repositories for llm-d-kv-cache

Users that are interested in llm-d-kv-cache are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

llm-d / llm-d-router
View on GitHub
llm-d Router: The intelligent entry point for inference requests
☆269Updated this week
llm-d / llm-d-routing-sidecar
View on GitHub
Incubating P/D sidecar for llm-d
☆17Nov 13, 2025Updated 8 months ago
llm-d / llm-d-inference-sim
View on GitHub
A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual h…
☆170Updated this week
llm-d / llm-d-benchmark
View on GitHub
llm-d benchmark scripts and tooling
☆62Updated this week
agents-first / clawdchan
View on GitHub
Let my Claude talk to yours.
☆30Updated this week
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
llm-d / llm-d
View on GitHub
Achieve state of the art inference performance with modern accelerators on Kubernetes
☆3,875Updated this week
llm-d / llm-d-deployer
View on GitHub
Helm charts for llm-d
☆52Jul 22, 2025Updated last year
llm-d-incubation / llm-d-modelservice
View on GitHub
helm charts for deploying models with llm-d
☆31Jun 27, 2026Updated 3 weeks ago
kubernetes-sigs / gateway-api-inference-extension
View on GitHub
Gateway API Inference Extension
☆723Updated this week
llm-d / llm-d-model-service
View on GitHub
Simplified model deployment on llm-d
☆29Jul 2, 2025Updated last year
llm-d-incubation / llm-d-infra
View on GitHub
llm-d helm charts and deployment examples
☆59May 1, 2026Updated 2 months ago
InftyAI / Manta
View on GitHub
💫 A lightweight p2p-based cache system for model distributions on Kubernetes. Reframing now to make it an unified cache system with POSI…
☆27Dec 6, 2024Updated last year
kubernetes-sigs / inference-perf
View on GitHub
GenAI inference performance benchmarking tool
☆212Updated this week
llm-d / llm-d-workload-variant-autoscaler
View on GitHub
Variant optimization autoscaler for distributed inference workloads
☆52Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
llm-d / llm-d-inference-payload-processor
View on GitHub
Inference payload processor for llm-d
☆16Updated this week
vllm-project / agentic-api
View on GitHub
Stateful API logic for agentic applications using vLLM
☆54Updated this week
ome-projects / ome
View on GitHub
Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, T…
☆482Updated this week
llm-d-incubation / llm-d-fast-model-actuation
View on GitHub
Kubernetes controllers for fast model actuation using vLLM sleep/wake and launcher-based model swapping
☆16Updated this week
ai-dynamo / grove
View on GitHub
Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling
☆244Updated this week
ai-dynamo / nixl
View on GitHub
NVIDIA Inference Xfer Library (NIXL)
☆1,151Updated this week
ai-dynamo / aiconfigurator
View on GitHub
Offline optimization of your disaggregated Dynamo graph
☆374Updated this week
clusterlink-net / clusterlink
View on GitHub
A Gateway for connecting application services in different domains, networks, and cloud infrastructures
☆23Feb 1, 2026Updated 5 months ago
kerthcet / github-workflow-as-kube
View on GitHub
Following the same workflows as Kubernetes. Widely used in InftyAI community.
☆13May 31, 2026Updated last month
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
bytedance / InfiniStore
View on GitHub
KV cache store for distributed LLM inference
☆425Nov 13, 2025Updated 8 months ago
kfirtoledo / multi-mcp
View on GitHub
☆110Jul 21, 2025Updated last year
gammagrid / gammagrid
View on GitHub
Open-source options gamma exposure (GEX) & positioning dashboard — dealer GEX, max pain, open interest, IV surface. Self-hosted, Docker, …
☆47Updated this week
kubernetes-sigs / lws
View on GitHub
LeaderWorkerSet: An API for deploying a group of pods as a unit of replication
☆769Updated this week
InftyAI / llmaz
View on GitHub
☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
☆309Jan 26, 2026Updated 6 months ago
volcano-sh / kthena
View on GitHub
Kubernetes-native AI serving platform for scalable model serving.
☆396Updated this week
DaoCloud / ckube
View on GitHub
Kubernetes APIServer 高性能代理组件，代理 APIServer 的 List 请求，其它类型的请求会直接反向代理到原生 APIServer。 CKube 还额外支持了分页、搜索和索引等功能。并且，CKube 100% 兼容原生 kubectl 和 ku…
☆19Sep 16, 2022Updated 3 years ago
llm-d-incubation / llm-d-planner
View on GitHub
☆25Updated this week
kubernetes-sigs / wg-serving
View on GitHub
WG Serving
☆38Mar 24, 2026Updated 4 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
vllm-project / router
View on GitHub
A high-performance and light-weight router for vLLM large scale deployment
☆328Updated this week
sgl-project / rbg
View on GitHub
A workload for deploying LLM inference services on Kubernetes
☆263Updated this week
ovg-project / kvcached
View on GitHub
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
☆1,115Updated this week
samzong / gmc
View on GitHub
Parallel git worktrees for parallel AI agents — plus AI-generated commits.
☆18Jul 1, 2026Updated 3 weeks ago
ai-dynamo / dynamo
View on GitHub
A Datacenter Scale Distributed Inference Serving Framework
☆7,580Updated this week
kai-scheduler / KAI-Scheduler
View on GitHub
KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale
☆1,409Updated this week
d-run / drun-docs
View on GitHub
d.run website
☆18Jul 3, 2026Updated 3 weeks ago