sjtu-zhao-lab / ClusterKVView external linksLinks
ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression (DAC'25)
☆23Sep 15, 2025Updated 5 months ago
Alternatives and similar repositories for ClusterKV
Users that are interested in ClusterKV are comparing it to the libraries listed below
Sorting:
- Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.☆122Jan 1, 2026Updated last month
- [ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference☆56Nov 20, 2024Updated last year
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆53Dec 17, 2024Updated last year
- An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation☆56Mar 22, 2024Updated last year
- ☆30Oct 4, 2025Updated 4 months ago
- The Official Implementation of Ada-KV [NeurIPS 2025]☆129Nov 26, 2025Updated 2 months ago
- ☆36Oct 16, 2025Updated 4 months ago
- PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design (KDD 2025)☆30Jun 14, 2024Updated last year
- [SIGMOD 2025] PQCache: Product Quantization-based KVCache for Long Context LLM Inference☆82Dec 7, 2025Updated 2 months ago
- The evaluation framework for training-free sparse attention in LLMs☆119Jan 27, 2026Updated 2 weeks ago
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆129Jun 24, 2025Updated 7 months ago
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs☆193Sep 23, 2025Updated 4 months ago
- Code for the preprint "Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?"☆48Jul 29, 2025Updated 6 months ago
- [ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference☆283May 1, 2025Updated 9 months ago
- ☆20May 24, 2025Updated 8 months ago
- ☆12Jul 4, 2024Updated last year
- Official implementation of the paper "Pretraining Language Models to Ponder in Continuous Space"☆24Jul 21, 2025Updated 6 months ago
- The official implementation of the paper "Self-Updatable Large Language Models by Integrating Context into Model Parameters"☆15May 18, 2025Updated 8 months ago
- A repo to keep all resources about interpretability in NLP organised and up to date☆12Nov 22, 2020Updated 5 years ago
- a vue-demo:vue仿网易新闻m站☆10Jul 26, 2017Updated 8 years ago
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …☆11Mar 18, 2023Updated 2 years ago
- A library for handling Structural Causal Models and performing interventional and counterfactual inference on them.☆11Jul 3, 2020Updated 5 years ago
- ☆18Jun 23, 2025Updated 7 months ago
- DiscreteTom's Blog Boilerplate.☆10Mar 6, 2023Updated 2 years ago
- ☆13Jul 8, 2020Updated 5 years ago
- ☆13Sep 8, 2024Updated last year
- Scikit-learn vectorizer implementing "A simple but tough-to-beat baseline for sentence embeddings." by Arora, Sanjeev, Yingyu Liang, and …☆12Apr 1, 2018Updated 7 years ago
- Kernel Library Wheel for SGLang☆17Updated this week
- ☆12Nov 15, 2022Updated 3 years ago
- The collections of MOE (Mixture Of Expert) papers, code and tools, etc.☆12Mar 15, 2024Updated last year
- ☆13Jul 2, 2025Updated 7 months ago
- 台大Coursera 机器学习基石 林轩田☆15Nov 23, 2018Updated 7 years ago
- LZW压缩算法的完整实现☆10Aug 14, 2014Updated 11 years ago
- Continual Memorization of Factoids in Large Language Models☆12Nov 20, 2024Updated last year
- ☆13Nov 29, 2021Updated 4 years ago
- ☆17May 21, 2025Updated 8 months ago
- ☆25Oct 11, 2025Updated 4 months ago
- Implementation for <Understanding Robust Overftting of Adversarial Training and Beyond> in ICML'22.☆12Jul 1, 2022Updated 3 years ago
- C++ implement a simple CNN framework to train mnist data. Done!☆10Mar 29, 2022Updated 3 years ago