zcli-charlie / Awesome-KV-CacheLinks

☆75

Alternatives and similar repositories for Awesome-KV-Cache

Users that are interested in Awesome-KV-Cache are comparing it to the libraries listed below

Sorting:

smart-lty / ParallelSpeculativeDecoding
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆102Updated 3 months ago
YaoJiayi / CacheBlend
☆125Updated 3 weeks ago
TreeAI-Lab / Awesome-KV-Cache-Management
This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding co…
☆172Updated last week
microsoft / chunk-attention
☆78Updated 3 months ago
FasterDecoding / SnapKV
☆268Updated 3 weeks ago
Zefan-Cai / Awesome-LLM-KV-Cache
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.
☆343Updated 5 months ago
interestingLSY / swiftLLM
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …
☆238Updated last month
hao-ai-lab / vllm-ltr
[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank
☆51Updated 9 months ago
dilab-zju / self-speculative-decoding
Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
☆199Updated 5 months ago
d-matrix-ai / keyformer-llm
☆54Updated last year
FFY0 / AdaKV
The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
☆87Updated last month
zhengzangw / Sequence-Scheduling
PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".
☆90Updated 2 years ago
UChi-JCL / CacheGen
☆115Updated 9 months ago
thunlp / FR-Spec
[ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling
☆36Updated 3 weeks ago
HarryWu99 / llm_kvcache_sparsity
Implement some method of LLM KV Cache Sparsity
☆35Updated last year
yale-sys / prompt-cache
Modular and structured prompt caching for low-latency LLM inference
☆97Updated 8 months ago
mit-han-lab / Quest
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
☆311Updated 3 weeks ago
NJUNLP / MCSD
Multi-Candidate Speculative Decoding
☆36Updated last year
mutonix / pyramidinfer
☆43Updated 8 months ago
andy-yang-1 / DoubleSparse
16-fold memory access reduction with nearly no loss
☆103Updated 4 months ago
Infini-AI-Lab / MagicDec
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆123Updated 8 months ago
hyx1999 / SAM-Decoding
Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton
☆30Updated 5 months ago
LiuXiaoxuanPKU / OSD
☆54Updated 8 months ago
microsoft / ParrotServe
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
☆170Updated 10 months ago
LoongServe / LoongServe
☆109Updated 8 months ago
Equationliu / Kangaroo
[NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitin…
☆59Updated last year
thu-pacman / FasterMoE
☆86Updated 3 years ago
henryzhongsc / longctx_bench
Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…
☆81Updated 5 months ago
opengear-project / GEAR
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
☆165Updated last year
hemingkx / Spec-Bench
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
☆299Updated 3 months ago