bd-iaas-us / InfiniStore

A distributed KV store for disaggregated LLM inference

☆31

Alternatives and similar repositories for InfiniStore:

Users that are interested in InfiniStore are comparing it to the libraries listed below

WukLab / preble
Stateful LLM Serving
☆46Updated 6 months ago
Azure / msccl-executor-nccl
☆36Updated 2 months ago
eniac / paella
Paella: Low-latency Model Serving with Virtualized GPU Scheduling
☆58Updated 9 months ago
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆103Updated last month
pkusys / Auncel
Vector search with bounded performance.
☆34Updated last year
eth-easl / orion
An interference-aware scheduler for fine-grained GPU sharing
☆123Updated 3 weeks ago
microsoft / NPKit
NCCL Profiling Kit
☆127Updated 7 months ago
alibaba / llm-scheduling-artifact
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆60Updated 8 months ago
SJTU-IPADS / PhoenixOS
Fast OS-level support for GPU checkpoint and restore
☆153Updated this week
Azure / msccl
Microsoft Collective Communication Library
☆62Updated 2 months ago
abcdabcd987 / libfabric-efa-demo
☆26Updated last month
tyler-griggs / melange-release
☆43Updated 7 months ago
AlibabaPAI / llumnix
Efficient and easy multi-instance LLM serving
☆298Updated this week
SymbioticLab / Oobleck
A resilient distributed training framework
☆88Updated 10 months ago
quiver-team / quiver-feature
High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph
☆50Updated 2 years ago
Hsword / SpotServe
SpotServe: Serving Generative Large Language Models on Preemptible Instances
☆112Updated 11 months ago
hao-ai-lab / MuxServe
☆50Updated 8 months ago
alibaba / easydist
Automated Parallelization System and Infrastructure for Multiple Ecosystems
☆78Updated 3 months ago
microsoft / ParrotServe
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
☆145Updated 5 months ago
uccl-project / uccl
Ultra | Ultimate | Unified CCL
☆32Updated last week
YaoJiayi / CacheBlend
☆77Updated last month
CGCL-codes / streambox
☆11Updated 8 months ago
flexflow / flexflow-serve
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
☆17Updated this week
LoongServe / LoongServe
☆83Updated 3 months ago
microsoft / mscclpp
MSCCL++: A GPU-driven communication stack for scalable AI applications
☆297Updated this week
LMCache / lmcache-vllm
The driver for LMCache core to run in vLLM
☆29Updated 2 weeks ago
mcrl / tccl
Thunder Research Group's Collective Communication Library
☆33Updated 9 months ago
suquark / hoplite
☆43Updated 3 years ago
parasailteam / coconet
☆75Updated 2 years ago
DefTruth / hgemm-tensorcores-mma
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
☆52Updated 2 weeks ago