snu-mllab / KVzipLinks
[NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)
☆146Updated 2 weeks ago
Alternatives and similar repositories for KVzip
Users that are interested in KVzip are comparing it to the libraries listed below
Sorting:
- Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.☆148Updated last week
- ☆85Updated this week
- ☆60Updated 6 months ago
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆223Updated last week
- Work in progress.☆75Updated 4 months ago
- [ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.☆50Updated 6 months ago
- ☆81Updated 5 months ago
- ☆101Updated 2 months ago
- Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI.☆223Updated last month
- ☆62Updated 4 months ago
- KV cache compression for high-throughput LLM inference☆143Updated 9 months ago
- [NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.