ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression (DAC'25)
☆25Feb 26, 2026Updated last week
Alternatives and similar repositories for ClusterKV
Users that are interested in ClusterKV are comparing it to the libraries listed below
Sorting:
- [ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference☆57Nov 20, 2024Updated last year
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆53Dec 17, 2024Updated last year
- The Official Implementation of Ada-KV [NeurIPS 2025]☆128Nov 26, 2025Updated 3 months ago
- ☆37Oct 16, 2025Updated 4 months ago
- PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design (KDD 2025)☆30Jun 14, 2024Updated last year
- The evaluation framework for training-free sparse attention in LLMs☆121Jan 27, 2026Updated last month
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆129Jun 24, 2025Updated 8 months ago
- Official Implementation for [ICLR26] DefensiveKV: Taming the Fragility of KV Cache Eviction in LLM Inference☆22Feb 9, 2026Updated last month
- Code for the preprint "Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?"☆48Jul 29, 2025Updated 7 months ago
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs☆194Sep 23, 2025Updated 5 months ago
- An experimentation platform for LLM inference optimisation☆36Sep 19, 2024Updated last year
- [ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference☆283May 1, 2025Updated 10 months ago
- ☆12Jul 4, 2024Updated last year
- ☆20May 24, 2025Updated 9 months ago
- ☆15Jan 27, 2026Updated last month
- Official implementation of the paper "Pretraining Language Models to Ponder in Continuous Space"☆25Jul 21, 2025Updated 7 months ago
- A repo to keep all resources about interpretability in NLP organised and up to date☆12Nov 22, 2020Updated 5 years ago
- Scikit-learn vectorizer implementing "A simple but tough-to-beat baseline for sentence embeddings." by Arora, Sanjeev, Yingyu Liang, and …☆12Apr 1, 2018Updated 7 years ago
- a vue-demo:vue仿网易新闻m站☆10Jul 26, 2017Updated 8 years ago
- ☆13Sep 8, 2024Updated last year
- ☆11Jan 17, 2024Updated 2 years ago
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …☆11Mar 18, 2023Updated 2 years ago
- a simple API to use CUPTI☆11Aug 19, 2025Updated 6 months ago
- ☆13Jul 8, 2020Updated 5 years ago
- ☆20Aug 14, 2025Updated 6 months ago
- ☆12Jul 6, 2023Updated 2 years ago
- PyTorch implementation of paper "Evolving Parameterized Prompt Memory for Continual Learning" in AAAI 2024 (Oral).☆14Apr 15, 2024Updated last year
- ☆31Nov 18, 2025Updated 3 months ago
- C++ implement a simple CNN framework to train mnist data. Done!☆10Mar 29, 2022Updated 3 years ago
- Official codes for COLING 2024 paper "Robust and Scalable Model Editing for Large Language Models": https://arxiv.org/abs/2403.17431v1☆14Mar 27, 2024Updated last year
- ☆12Jun 29, 2024Updated last year
- ☆13Jul 2, 2025Updated 8 months ago
- ☆17May 30, 2025Updated 9 months ago
- Implementation for <Understanding Robust Overftting of Adversarial Training and Beyond> in ICML'22.☆12Jul 1, 2022Updated 3 years ago
- ☆12Nov 15, 2022Updated 3 years ago
- [ICLR 2026] BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs☆17May 21, 2025Updated 9 months ago
- ☆13Nov 29, 2021Updated 4 years ago
- QuoteSum is a textual QA dataset containing Semi-Extractive Multi-source Question Answering (SEMQA) examples written by humans, based on …☆13Mar 25, 2024Updated last year
- An implementation for MetGen: A Module-Based Entailment Tree Generation Framework for Answer Explanation.☆13Jul 21, 2022Updated 3 years ago