taco-project / FlexKVLinks
☆127Updated this week
Alternatives and similar repositories for FlexKV
Users that are interested in FlexKV are comparing it to the libraries listed below
Sorting:
- KV cache store for distributed LLM inference☆372Updated last month
- Efficient and easy multi-instance LLM serving☆517Updated 3 months ago
- NVIDIA Inference Xfer Library (NIXL)☆753Updated this week
- GLake: optimizing GPU memory management and IO transmission.☆491Updated 8 months ago
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆701Updated 2 weeks ago
- NVIDIA NCCL Tests for Distributed Training☆129Updated this week
- OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)☆334Updated this week
- Offline optimization of your disaggregated Dynamo graph☆121Updated this week
- Fast OS-level support for GPU checkpoint and restore☆260Updated 2 months ago
- CUDA checkpoint and restore utility☆396Updated 3 months ago
- ☆329Updated last month
- Disaggregated serving system for Large Language Models (LLMs).☆749Updated 8 months ago
- A high-performance RL training-inference weight synchronization framework, designed to enable second-level parameter updates from trainin…☆109Updated last week
- DeepSeek-V3/R1 inference performance simulator☆169Updated 8 months ago
- Stateful LLM Serving☆89Updated 9 months ago
- A workload for deploying LLM inference services on Kubernetes☆136Updated this week
- The driver for LMCache core to run in vLLM☆59Updated 10 months ago
- Fast and memory-efficient exact attention☆104Updated this week
- A low-latency & high-throughput serving engine for LLMs☆456Updated 2 months ago
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆118Updated 6 months ago
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆73Updated this week
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆247Updated this week
- ☆142Updated last year
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆133Updated last year
- ☆433Updated 2 months ago
- ☆135Updated this week
- An efficient GPU resource sharing system with fine-grained control for Linux platforms.☆87Updated last year
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆119Updated last year
- Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.☆49Updated 4 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆130Updated 2 months ago