bytedance / InfiniStoreLinks

KV cache store for distributed LLM inference

☆361

Alternatives and similar repositories for InfiniStore

Users that are interested in InfiniStore are comparing it to the libraries listed below

Sorting:

AlibabaPAI / llumnix
Efficient and easy multi-instance LLM serving
☆510Updated 2 months ago
ai-dynamo / nixl
NVIDIA Inference Xfer Library (NIXL)
☆721Updated this week
ovg-project / kvcached
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
☆661Updated 2 weeks ago
stepfun-ai / StepMesh
☆316Updated last week
perplexityai / pplx-kernels
Perplexity GPU Kernels
☆529Updated 2 weeks ago
SJTU-IPADS / PhoenixOS
Fast OS-level support for GPU checkpoint and restore
☆257Updated last month
antgroup / glake
GLake: optimizing GPU memory management and IO transmission.
☆489Updated 7 months ago
microsoft / mscclpp
MSCCL++: A GPU-driven communication stack for scalable AI applications
☆437Updated this week
infinigence / Semi-PD
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆115Updated 6 months ago
microsoft / vattention
Dynamic Memory Management for Serving LLMs without PagedAttention
☆436Updated 5 months ago
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆142Updated 10 months ago
meta-pytorch / torchcomms
torchcomms: a modern PyTorch communications API
☆291Updated this week
LMCache / lmcache-vllm
The driver for LMCache core to run in vLLM
☆58Updated 9 months ago
sgl-project / genai-bench
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆230Updated last week
microsoft / sarathi-serve
A low-latency & high-throughput serving engine for LLMs
☆445Updated last month
ai-dynamo / aiconfigurator
Offline optimization of your disaggregated Dynamo graph
☆106Updated this week
sgl-project / sgl-learning-materials
Materials for learning SGLang
☆650Updated this week
LLMServe / DistServe
Disaggregated serving system for Large Language Models (LLMs).
☆729Updated 7 months ago
sgl-project / ome
OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)
☆312Updated last week
zartbot / shallowsim
DeepSeek-V3/R1 inference performance simulator
☆168Updated 7 months ago
antgroup / DeepXTrace
DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.
☆67Updated 2 weeks ago
WukLab / preble
Stateful LLM Serving
☆88Updated 8 months ago
vllm-project / flash-attention
Fast and memory-efficient exact attention
☆99Updated this week
abcdabcd987 / libfabric-efa-demo
☆71Updated 10 months ago
NVIDIA / nvshmem
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…
☆385Updated last week
perplexityai / pplx-garden
Perplexity open source garden for inference technology
☆232Updated this week
sgl-project / SpecForge
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
☆483Updated this week
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆187Updated last month
fzyzcjy / torch_memory_saver
Allow torch tensor memory to be released and resumed later
☆167Updated last week
AlibabaPAI / torchacc
PyTorch distributed training acceleration framework
☆53Updated 3 months ago