FFY0/DefensiveKV

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/FFY0/DefensiveKV)

FFY0 / DefensiveKV

Official Implementation for [ICLR26] DefensiveKV: Taming the Fragility of KV Cache Eviction in LLM Inference

☆56

Alternatives and similar repositories for DefensiveKV

Users that are interested in DefensiveKV are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

FFY0 / AdaKV
View on GitHub
The Official Implementation of Ada-KV [NeurIPS 2025]
☆139Nov 26, 2025Updated 7 months ago
antgroup / cakekv
View on GitHub
☆39Mar 17, 2025Updated last year
apple / ml-epicache
View on GitHub
☆30Oct 2, 2025Updated 9 months ago
hemingkx / Whisper
View on GitHub
[ACL 2026] Enabling Efficient Reasoning in LLMs via Black-box Persuasive Prompting
☆22Jan 9, 2026Updated 6 months ago
66RING / CritiPrefill
View on GitHub
Code repo for "CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs".
☆17Sep 15, 2024Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
AIoT-MLSys-Lab / D2O
View on GitHub
[ICLR 2025🔥] D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
☆27Jul 7, 2025Updated last year
amy-77 / ParisKV
View on GitHub
🔥 [ICML'26] ParisKV: Fast and Drift-Robust KV-Cache Retrieval for Long-Context LLMs
☆30Jun 29, 2026Updated 3 weeks ago
October2001 / Awesome-KV-Cache-Compression
View on GitHub
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
☆726Apr 15, 2026Updated 3 months ago
zhzihao / Learning-to-Draft
View on GitHub
Official implementation of "Learning To Draft: Adaptive Speculative Decoding with Reinforcement Learning" (ICLR 2026)
☆19Mar 1, 2026Updated 4 months ago
Janghyun1230 / FastKVzip
View on GitHub
Accurate and fast KV cache compression with a gating mechanism
☆27Apr 5, 2026Updated 3 months ago
antgroup / OmniKV
View on GitHub
Dynamic Context Selection for Efficient Long-Context LLMs
☆63May 20, 2025Updated last year
NVlabs / RocketKV
View on GitHub
[ICML 2025] RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression
☆50Aug 7, 2025Updated 11 months ago
mohamed / roofline
View on GitHub
A simple script to plot the Roofline model for given HW platforms and applications
☆10Mar 17, 2026Updated 4 months ago
xuyang-liu16 / MixKV
View on GitHub
[ICLR 2026] Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models
☆29Mar 21, 2026Updated 4 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
wzq016 / PINE
View on GitHub
Offcial Repo of Paper "Eliminating Position Bias of Language Models: A Mechanistic Approach""
☆23Jun 13, 2025Updated last year
HarryWu99 / llm_kvcache_sparsity
View on GitHub
Implement some method of LLM KV Cache Sparsity
☆41Jun 6, 2024Updated 2 years ago
lukewys / dcase_2020_T6
View on GitHub
2nd place solution for 2020 DCASE challenge task 6 audio captioning. http://dcase.community/challenge2020/task-automatic-audio-captioning…
☆24Aug 3, 2023Updated 2 years ago
SuDIS-ZJU / Efficient-LVLMs-Inference
View on GitHub
[ACL 2026 Findings] Living repository for the survey paper “Efficient Inference for Large Vision-Language Models: Bottlenecks, Techniques…
☆26Apr 8, 2026Updated 3 months ago
Dominic789654 / LongGenBench
View on GitHub
Source code for the paper "LongGenBench: Long-context Generation Benchmark"
☆24Oct 8, 2024Updated last year
SalesforceAIResearch / ThinK
View on GitHub
ThinK: Thinner Key Cache by Query-Driven Pruning
☆30Jun 2, 2026Updated last month
tsinghua-ideal / Twilight
View on GitHub
[NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning
☆105Jul 8, 2026Updated 2 weeks ago
qipengwang / Melon
View on GitHub
MobiSys#114
☆23Aug 17, 2023Updated 2 years ago
RahulSChand / Weighted-low-rank-factorization-Pytorch
View on GitHub
PyTorch implementation of Language model compression with weighted low-rank factorization
☆14Jun 28, 2023Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
hdong920 / LESS
View on GitHub
☆53May 13, 2024Updated 2 years ago
mozhu621 / LongGenBench
View on GitHub
☆37Oct 4, 2025Updated 9 months ago
real-absolute-AI / RAPID
View on GitHub
[ICML 2025 Spotlight] RAPID: Long-Context Inference with Retrieval-Augmented Speculative Decoding
☆23Mar 2, 2025Updated last year
MinkaiXu / AliDiff
View on GitHub
NeurIPS24: Aligning Target-Aware Molecule Diffusion Models with Exact Energy Optimization
☆43Apr 2, 2025Updated last year
TemporaryLoRA / Block-Attention
View on GitHub
☆48Mar 15, 2025Updated last year
gmlwns2000 / sea-attention
View on GitHub
Official Implementation of SEA: Sparse Linear Attention with Estimated Attention Mask (ICLR 2024)
☆12Jun 20, 2025Updated last year
snu-mllab / KVzip
View on GitHub
[NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)
☆225Feb 11, 2026Updated 5 months ago
GUET-PDK / pdk-mini
View on GitHub
GUET跑得快微信小程序——校园跑腿系统（20级软工课设）
☆14Jun 20, 2023Updated 3 years ago
facebookresearch / SecureFLCompression
View on GitHub
Compression primitives for uplink compression in Federated Learning that are compatible with Secure Aggregation.
☆11Jul 27, 2022Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
A4Bio / MotifRetro
View on GitHub
The official implementation of the paper "MotifRetro: Exploring the Combinability-Consistency Trade-offs in retrosynthesis via Dynamic Mo…
☆11Jun 25, 2023Updated 3 years ago
HugoZHL / PQCache
View on GitHub
[SIGMOD 2025] PQCache: Product Quantization-based KVCache for Long Context LLM Inference
☆91Dec 7, 2025Updated 7 months ago
goodevening13 / aquakv
View on GitHub
☆21Apr 27, 2026Updated 2 months ago
mbalesni / deepspeed_llama
View on GitHub
Finetuning LLaMA with DeepSpeed
☆10Apr 14, 2023Updated 3 years ago
yongzhuo / InternLM-SFT
View on GitHub
InternLM-7B微调, SFT/LoRA, instruction finetune
☆13May 17, 2024Updated 2 years ago
ToyoDAdoubiBackup / SSRStatus
View on GitHub
Shadowsocks/ShadowsocksR 账号在线监控
☆12Nov 25, 2018Updated 7 years ago
kyegomez / MGQA
View on GitHub
The open source implementation of the multi grouped query attention by the paper "GQA: Training Generalized Multi-Query Transformer Model…
☆17Dec 11, 2023Updated 2 years ago