ovg-project / kvcachedLinks

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

☆104

Alternatives and similar repositories for kvcached

Users that are interested in kvcached are comparing it to the libraries listed below

Sorting:

sgl-project / genai-bench
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆220Updated last week
hao-ai-lab / MuxServe
☆74Updated this week
WukLab / preble
Stateful LLM Serving
☆87Updated 7 months ago
project-etalon / etalon
LLM Serving Performance Evaluation Harness
☆79Updated 7 months ago
snowflakedb / ArcticInference
ArcticInference: vLLM plugin for high-throughput, low-latency inference
☆283Updated this week
microsoft / ParrotServe
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
☆186Updated last year
perplexityai / pplx-kernels
Perplexity GPU Kernels
☆497Updated last month
stepfun-ai / StepMesh
☆307Updated 3 weeks ago
microsoft / sarathi-serve
A low-latency & high-throughput serving engine for LLMs
☆431Updated last week
AlibabaPAI / llumnix
Efficient and easy multi-instance LLM serving
☆497Updated last month
bytedance / InfiniStore
KV cache store for distributed LLM inference
☆345Updated last month
Hsword / SpotServe
SpotServe: Serving Generative Large Language Models on Preemptible Instances
☆129Updated last year
interestingLSY / swiftLLM
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …
☆278Updated 4 months ago
sgl-project / sglang-jax
JAX backend for SGL
☆77Updated this week
LMCache / lmcache-vllm
The driver for LMCache core to run in vLLM
☆54Updated 8 months ago
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆181Updated last week
flexflow / flexflow-serve
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
☆63Updated last month
tyler-griggs / melange-release
☆47Updated last year
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆138Updated 9 months ago
YaoJiayi / CacheBlend
☆141Updated 3 months ago
ByteDance-Seed / ByteCheckpoint
ByteCheckpoint: An Unified Checkpointing Library for LFMs
☆249Updated 3 months ago
EfficientMoE / MoE-Infinity
PyTorch library for cost-effective, fast and easy serving of MoE models.
☆252Updated last week
UChi-JCL / CacheGen
☆136Updated last year
fzyzcjy / torch_memory_saver
Allow torch tensor memory to be released and resumed later
☆150Updated this week
sii-research / VCCL
Venus Collective Communication Library, supported by SII and Infrawaves.
☆104Updated last week
infinigence / Semi-PD
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆112Updated 5 months ago
microsoft / vidur
A large-scale simulation framework for LLM inference
☆459Updated 2 months ago
sgl-project / SpecForge
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
☆439Updated this week
DeepLink-org / DLSlime
DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit
☆70Updated this week
sgl-project / sgl-cookbook
Make SGLang go brrr
☆36Updated 3 weeks ago