HugoZHL/PQCache

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/HugoZHL/PQCache)

HugoZHL / PQCache

[SIGMOD 2025] PQCache: Product Quantization-based KVCache for Long Context LLM Inference

☆91

Alternatives and similar repositories for PQCache

Users that are interested in PQCache are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

microsoft / RetrievalAttention
View on GitHub
[VLDB 26, NeurIPS 25] Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.
☆147Feb 22, 2026Updated 4 months ago
fpgasystems / Chameleon-RAG-Acceleration
View on GitHub
☆23Jun 1, 2025Updated last year
iankur / vqllm
View on GitHub
Residual vector quantization for KV cache compression in large language model
☆12Oct 22, 2024Updated last year
Infini-AI-Lab / MagicPIG
View on GitHub
[ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation
☆255Dec 16, 2024Updated last year
adslabcuhk / elect
View on GitHub
☆13Aug 1, 2025Updated 11 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
SqueezeAILab / SqueezedAttention
View on GitHub
[ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference
☆58Nov 20, 2024Updated last year
snu-comparch / InfiniGen
View on GitHub
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
☆192Jul 10, 2024Updated 2 years ago
pku-liang / ArkVale
View on GitHub
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)
☆54Dec 17, 2024Updated last year
thustorage / GustANN
View on GitHub
High-Throughput, Cost-Effective Billion-Scale Vector Search with a Single GPU [SIGMOD'26]
☆30Apr 22, 2026Updated 2 months ago
sail-sg / SimLayerKV
View on GitHub
The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.
☆54Oct 18, 2024Updated last year
AlayaDB-AI / ParaGraph
View on GitHub
A cross-modal vector index with fast construction on heterogeneous CPU-GPU environment. Published on DaMoN@SIGMOD 2025.
☆16Jul 16, 2025Updated last year
pzs19 / TokenSelect
View on GitHub
☆20Mar 11, 2025Updated last year
mit-han-lab / Quest
View on GitHub
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
☆400Jul 10, 2025Updated last year
FasterDecoding / SnapKV
View on GitHub
☆324Jul 10, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ZJU-DAILY / HNSW-Flash
View on GitHub
Source code for the paper: "Accelerating Graph Indexing for ANNS on Modern CPUs"
☆34Nov 9, 2025Updated 8 months ago
antgroup / cakekv
View on GitHub
☆39Mar 17, 2025Updated last year
TKONIY / tutorial-any-repo
View on GitHub
Claude Code skill: Generate file-by-file code tutorial websites for any repository with parallel agent teams
☆28Mar 13, 2026Updated 4 months ago
SimoneZeng / awesome-vector-ANN-search-papers
View on GitHub
A list of papers in the field of approximate nearest neighbor search on high-dimensional vectors.
☆135Jun 9, 2026Updated last month
DerrickYLJ / TidalDecode
View on GitHub
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
☆56Aug 6, 2025Updated 11 months ago
ytgui / PilotANN
View on GitHub
Memory-Bounded GPU Acceleration for Vector Search
☆33Dec 29, 2025Updated 6 months ago
gaurav16gupta / constrainedANN
View on GitHub
☆14Jan 20, 2025Updated last year
amy-77 / ParisKV
View on GitHub
🔥 [ICML'26] ParisKV: Fast and Drift-Robust KV-Cache Retrieval for Long-Context LLMs
☆30Jun 29, 2026Updated 3 weeks ago
microsoft / SeerAttention
View on GitHub
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
☆213Jul 10, 2026Updated last week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
NVlabs / RocketKV
View on GitHub
[ICML 2025] RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression
☆49Aug 7, 2025Updated 11 months ago
FFY0 / AdaKV
View on GitHub
The Official Implementation of Ada-KV [NeurIPS 2025]
☆139Nov 26, 2025Updated 7 months ago
SylviaZiyuZhang / CleANN
View on GitHub
Public repository for CleANN, an efficient fully-dynamic approximate nearest neighbor search index
☆16Sep 18, 2025Updated 10 months ago
YZ-Cai / Unified-Navigating-Graph
View on GitHub
Official implementation for paper "Navigating Labels and Vectors: A Unified Approach to Filtered Approximate Nearest Neighbor Search"
☆38Dec 21, 2024Updated last year
zilliztech / starling
View on GitHub
☆86Sep 4, 2024Updated last year
antgroup / OmniKV
View on GitHub
Dynamic Context Selection for Efficient Long-Context LLMs
☆63May 20, 2025Updated last year
thunlp / InfLLM
View on GitHub
The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Mem…
☆405Apr 20, 2024Updated 2 years ago
PDS-Lab / Rcmp
View on GitHub
Rcmp: Reconstructing RDMA-based Memory Disaggregation via CXL
☆63Dec 26, 2023Updated 2 years ago
tsinghua-ideal / Twilight
View on GitHub
[NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning
☆105Jul 8, 2026Updated last week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
andy-yang-1 / DoubleSparse
View on GitHub
16-fold memory access reduction with nearly no loss
☆107Mar 26, 2025Updated last year
ydyhello / TailorKV
View on GitHub
Official implementation of "TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization" (Findings of ACL …
☆21Jul 25, 2025Updated 11 months ago
cat538 / SKVQ
View on GitHub
[COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
☆24Oct 5, 2024Updated last year
LMCache / lmcache-agent-trace
View on GitHub
Agent application/benchmark/workload traces should be placed here.
☆15Apr 13, 2026Updated 3 months ago
AIS-SNU / PathWeaver
View on GitHub
A High-Throughput Multi-GPU System for Graph-Based Approximate Nearest Neighbor Search
☆21Jul 22, 2025Updated 11 months ago
intel / ScalableVectorSearch
View on GitHub
☆233Updated this week
thustorage / PipeANN
View on GitHub
A low-latency, billion-scale, and updatable graph-based vector store on SSD.
☆144Updated this week