Official Implementation for [ICLR26] DefensiveKV: Taming the Fragility of KV Cache Eviction in LLM Inference
☆45Mar 28, 2026Updated 2 months ago
Alternatives and similar repositories for DefensiveKV
Users that are interested in DefensiveKV are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code repo for "CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs".☆17Sep 15, 2024Updated last year
- The Official Implementation of Ada-KV [NeurIPS 2025]☆135Nov 26, 2025Updated 6 months ago
- PyTorch implementation of Language model compression with weighted low-rank factorization☆14Jun 28, 2023Updated 2 years ago
- Official Implementation of Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training☆188May 5, 2026Updated last month
- ☆48Mar 15, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Official implementation for LaCo (EMNLP 2024 Findings)☆22Oct 3, 2024Updated last year
- LLM-guided hyperparameter tuning☆10Oct 7, 2023Updated 2 years ago
- ☆11Feb 15, 2023Updated 3 years ago
- The raw data and analysis code for the Microsoft Academic paper recommender system user study conducted in 2018.☆16May 21, 2019Updated 7 years ago
- My record about learning the course MIT-6.824☆13Mar 28, 2022Updated 4 years ago
- Project for CS101016 and CS100160, Tongji University. Use Verilog HDL to build a CPU.☆10Mar 20, 2021Updated 5 years ago
- ☆48May 16, 2026Updated 3 weeks ago
- TPLink IPC Control☆20Jul 24, 2024Updated last year
- ☆145Aug 18, 2025Updated 9 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆14Aug 3, 2024Updated last year
- ☆11Mar 9, 2022Updated 4 years ago
- Source code of "FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework"☆11Oct 23, 2024Updated last year
- Unofficial implementations of block/layer-wise pruning methods for LLMs.☆78Apr 29, 2024Updated 2 years ago
- a game like qqtang qq堂 游戏☆15Dec 8, 2022Updated 3 years ago
- ThinK: Thinner Key Cache by Query-Driven Pruning☆30Jun 2, 2026Updated last week
- Incorporating the memory mechanism into the transformer and employing a parallel weighting structure to obtain a better utterance-level r…☆22Oct 4, 2025Updated 8 months ago
- ☆12Mar 24, 2025Updated last year
- ☆16Feb 20, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆107Sep 10, 2025Updated 9 months ago
- ☆17Jun 25, 2024Updated last year
- ☆17Apr 13, 2025Updated last year
- 23秋季工程化C程序设计代码仓库,包括lab1-5的实验代码和实验报告,感兴趣的话就点个star吧~☆10Mar 1, 2025Updated last year
- ☆11Oct 10, 2021Updated 4 years ago
- [EMNLP 2025 Main] SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning☆44Apr 16, 2026Updated last month
- Computational analysis of nucleic acids structures using graph neural networks☆15Mar 25, 2024Updated 2 years ago
- 同济的计算机组成原理实验要求的54条指令CPU☆13Feb 27, 2020Updated 6 years ago
- A collection of papers on LLM applications in the IoT field.☆19Jan 21, 2026Updated 4 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Experimental deep learning framework written in Rust☆15Nov 2, 2022Updated 3 years ago
- The agentic plotting "IDE" built for everyone☆62Apr 2, 2026Updated 2 months ago
- [NeurIPS 2024] State Space Models on Temporal Graphs: A First-Principles Study☆16Dec 31, 2024Updated last year
- ☆18Apr 21, 2024Updated 2 years ago
- ☆23Jan 31, 2025Updated last year
- 2022级华南师范大学编译原理实验☆16Jun 16, 2024Updated last year
- W8A8/W4A8 inference + optimized SDPA on Apple Silicon — unlocking unused INT8 TensorOps in M5 for 1.2–1.9× faster LLM prefill, plus Flash…☆428Jun 5, 2026Updated last week