FFY0/AdaKV

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/FFY0/AdaKV)

FFY0 / AdaKV

The Official Implementation of Ada-KV [NeurIPS 2025]

☆139

Alternatives and similar repositories for AdaKV

Users that are interested in AdaKV are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

FFY0 / DefensiveKV
View on GitHub
Official Implementation for [ICLR26] DefensiveKV: Taming the Fragility of KV Cache Eviction in LLM Inference
☆54Mar 28, 2026Updated 3 months ago
FasterDecoding / SnapKV
View on GitHub
☆324Jul 10, 2025Updated last year
antgroup / cakekv
View on GitHub
☆39Mar 17, 2025Updated last year
Linking-ai / SCOPE
View on GitHub
(ACL2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation
☆36May 28, 2025Updated last year
FYYFU / HeadKV
View on GitHub
[ICLR2025] Code and data for paper: Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasonin…
☆45Mar 10, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
mutonix / pyramidinfer
View on GitHub
☆47Nov 25, 2024Updated last year
IsaacRe / vllm-kvcompress
View on GitHub
KV cache compression for high-throughput LLM inference
☆158Feb 5, 2025Updated last year
October2001 / Awesome-KV-Cache-Compression
View on GitHub
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
☆726Apr 15, 2026Updated 3 months ago
ByteDance-Seed / ShadowKV
View on GitHub
[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
☆310May 1, 2025Updated last year
snu-comparch / InfiniGen
View on GitHub
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
☆192Jul 10, 2024Updated 2 years ago
hdong920 / LESS
View on GitHub
☆53May 13, 2024Updated 2 years ago
NVIDIA / kvpress
View on GitHub
LLM KV cache compression made easy
☆1,142Jul 9, 2026Updated last week
snu-mllab / KVzip
View on GitHub
[NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)
☆225Feb 11, 2026Updated 5 months ago
AnswerDotAI / cold-compress
View on GitHub
Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…
☆153Aug 9, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Infini-AI-Lab / MagicPIG
View on GitHub
[ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation
☆255Dec 16, 2024Updated last year
d-matrix-ai / keyformer-llm
View on GitHub
Keyformer proposes KV Cache reduction through key tokens identification and without the need for fine-tuning
☆57Mar 26, 2024Updated 2 years ago
jy-yuan / KIVI
View on GitHub
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
☆419Nov 20, 2025Updated 8 months ago
Zefan-Cai / Awesome-LLM-KV-Cache
View on GitHub
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.
☆460Jun 17, 2026Updated last month
mit-han-lab / Quest
View on GitHub
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
☆400Jul 10, 2025Updated last year
sail-sg / SimLayerKV
View on GitHub
The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.
☆54Oct 18, 2024Updated last year
shadowpa0327 / Palu
View on GitHub
[ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection
☆158Feb 20, 2025Updated last year
FMInference / H2O
View on GitHub
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
☆530Aug 1, 2024Updated last year
tsinghua-ideal / Twilight
View on GitHub
[NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning
☆105Jul 8, 2026Updated last week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
xuyang-liu16 / MixKV
View on GitHub
[ICLR 2026] Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models
☆29Mar 21, 2026Updated 4 months ago
Dominic789654 / LongGenBench
View on GitHub
Source code for the paper "LongGenBench: Long-context Generation Benchmark"
☆24Oct 8, 2024Updated last year
mit-han-lab / duo-attention
View on GitHub
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
☆539Feb 10, 2025Updated last year
Zefan-Cai / KVCache-Factory
View on GitHub
Unified KV Cache Compression Methods for Auto-Regressive Models
☆1,355Jul 10, 2026Updated last week
opengear-project / GEAR
View on GitHub
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
☆183Jul 12, 2024Updated 2 years ago
HarryWu99 / llm_kvcache_sparsity
View on GitHub
Implement some method of LLM KV Cache Sparsity
☆41Jun 6, 2024Updated 2 years ago
HugoZHL / PQCache
View on GitHub
[SIGMOD 2025] PQCache: Product Quantization-based KVCache for Long Context LLM Inference
☆91Dec 7, 2025Updated 7 months ago
zyxxmu / cam
View on GitHub
Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference
☆50Jun 19, 2024Updated 2 years ago
zju-jiyicheng / SpecVLM
View on GitHub
[EMNLP 2025 Main] SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning
☆48Apr 16, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
AIoT-MLSys-Lab / D2O
View on GitHub
[ICLR 2025🔥] D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
☆27Jul 7, 2025Updated last year
SuDIS-ZJU / Efficient-LVLMs-Inference
View on GitHub
[ACL 2026 Findings] Living repository for the survey paper “Efficient Inference for Large Vision-Language Models: Bottlenecks, Techniques…
☆26Apr 8, 2026Updated 3 months ago
DerrickYLJ / TidalDecode
View on GitHub
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
☆57Aug 6, 2025Updated 11 months ago
XMUDeepLIT / QGC
View on GitHub
Code for "Retaining Key Information under High Compression Rates: Query-Guided Compressor for LLMs" (ACL 2024)
☆20Jun 12, 2024Updated 2 years ago
PiotrNawrot / sparse-frontier
View on GitHub
The evaluation framework for training-free sparse attention in LLMs
☆127Jan 27, 2026Updated 5 months ago
AIoT-MLSys-Lab / MEDA
View on GitHub
[NAACL 2025🔥] MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
☆22Jun 19, 2025Updated last year
CR400AF-A / SparseMM
View on GitHub
[ICCV 2025] SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs
☆88Jan 17, 2026Updated 6 months ago