LMCache / lmcache-vllm
The driver for LMCache core to run in vLLM
☆36Updated 2 months ago
Alternatives and similar repositories for lmcache-vllm:
Users that are interested in lmcache-vllm are comparing it to the libraries listed below
- Stateful LLM Serving☆58Updated last month
- LLM Serving Performance Evaluation Harness☆75Updated last month
- ☆45Updated 9 months ago
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆153Updated 6 months ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆112Updated 4 months ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆93Updated last year
- Perplexity GPU Kernels☆185Updated last week
- ☆56Updated 10 months ago
- ☆96Updated 6 months ago
- A low-latency & high-throughput serving engine for LLMs☆341Updated 2 months ago
- ☆78Updated 2 weeks ago
- DeepSeek-V3/R1 inference performance simulator☆106Updated 2 weeks ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆125Updated 2 weeks ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆72Updated 7 months ago
- Modular and structured prompt caching for low-latency LLM inference☆89Updated 5 months ago
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆113Updated last year
- ☆54Updated 6 months ago
- ☆97Updated 3 months ago
- ☆94Updated 5 months ago
- PyTorch distributed training acceleration framework☆47Updated 2 months ago
- ☆68Updated 4 months ago
- High performance Transformer implementation in C++.☆115Updated 2 months ago
- KV cache compression for high-throughput LLM inference☆126Updated 2 months ago
- KV cache store for distributed LLM inference☆136Updated last week
- Fast and memory-efficient exact attention☆59Updated this week
- ☆49Updated 4 months ago
- A minimal implementation of vllm.☆37Updated 8 months ago
- Efficient and easy multi-instance LLM serving☆367Updated this week
- Benchmark suite for LLMs from Fireworks.ai☆70Updated 2 months ago
- ☆11Updated this week