☆38Mar 17, 2025Updated last year
Alternatives and similar repositories for cakekv
Users that are interested in cakekv are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR2025] Code and data for paper: Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasonin…☆43Mar 10, 2025Updated last year
- ☆317Jul 10, 2025Updated 10 months ago
- (ACL2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation☆35May 28, 2025Updated last year
- The Official Implementation of Ada-KV [NeurIPS 2025]☆131Nov 26, 2025Updated 6 months ago
- [ICLR 2025🔥] D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models☆27Jul 7, 2025Updated 10 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆18Apr 15, 2025Updated last year
- 📰 Must-read papers on KV Cache Compression (constantly updating 🤗).☆709Apr 15, 2026Updated last month
- [ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Cont…☆72Sep 18, 2025Updated 8 months ago
- official code for GliDe with a CaPE☆21Aug 13, 2024Updated last year
- Chain-of-Thought Matters: Improving Long-Context Language Models with Reasoning Path Supervision☆19Apr 1, 2025Updated last year
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆151Aug 9, 2024Updated last year
- 使用onnxruntime部署Diffusion-Low-Light低光照图像增强,包含C++和Python两个版本的程序☆13Aug 18, 2024Updated last year
- [NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention…☆1,214Apr 8, 2026Updated last month
- Official Implementation of SEA: Sparse Linear Attention with Estimated Attention Mask (ICLR 2024)☆12Jun 20, 2025Updated 11 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆20Jun 17, 2024Updated last year
- ☆16Jul 31, 2025Updated 9 months ago
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆239Aug 2, 2024Updated last year
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆386Jul 10, 2025Updated 10 months ago
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 10 months ago
- Official Implementation for [ICLR26] DefensiveKV: Taming the Fragility of KV Cache Eviction in LLM Inference☆43Mar 28, 2026Updated 2 months ago
- Marathon: A Multiple-choice Long Context Evaluation Benchmark for Large Language Models.☆10May 16, 2024Updated 2 years ago
- QRHead: Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking☆39Jan 20, 2026Updated 4 months ago
- ☆47Nov 25, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆15Apr 11, 2024Updated 2 years ago
- ☆16May 22, 2025Updated last year
- ☆11May 19, 2025Updated last year
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference☆168Oct 13, 2025Updated 7 months ago
- [arXiv 2024] Is Oracle Pruning the True Oracle?☆26Jan 10, 2025Updated last year
- Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization (ISCA'24)☆32Jul 4, 2024Updated last year
- Offcial Repo of Paper "Eliminating Position Bias of Language Models: A Mechanistic Approach""☆23Jun 13, 2025Updated 11 months ago
- Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)☆63Apr 18, 2024Updated 2 years ago
- ☆15Jun 6, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference☆299May 1, 2025Updated last year
- 基于Bart语言模型的指针生成网络,用于中文语法纠错任务☆16Sep 8, 2022Updated 3 years ago
- ☆12Feb 28, 2025Updated last year
- The code based on vLLM for the paper “ Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention”.☆11Sep 19, 2024Updated last year
- ☆14Sep 28, 2021Updated 4 years ago
- STREAMer: Benchmarking remote volatile and non-volatile memory bandwidth☆18Aug 21, 2023Updated 2 years ago
- [COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding☆278Aug 31, 2024Updated last year