antgroup/OmniKV

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/antgroup/OmniKV)

antgroup / OmniKV

Dynamic Context Selection for Efficient Long-Context LLMs

☆63

Alternatives and similar repositories for OmniKV

Users that are interested in OmniKV are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

pku-liang / ArkVale
View on GitHub
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)
☆54Dec 17, 2024Updated last year
CURRENTF / LowRankClone
View on GitHub
[NeurIPS 2025 Spotlight] A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone.
☆49Oct 29, 2025Updated 9 months ago
SqueezeAILab / SqueezedAttention
View on GitHub
[ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference
☆58Nov 20, 2024Updated last year
nancheng58 / On-the-User-Behavior-Leakage-from-Recommender-System-Exposure
View on GitHub
[TOIS 2023] On the User Behavior Leakage from Recommender System Exposure
☆19Nov 7, 2023Updated 2 years ago
microsoft / RetrievalAttention
View on GitHub
[VLDB 26, NeurIPS 25] Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.
☆149Feb 22, 2026Updated 5 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
pzs19 / TokenSelect
View on GitHub
☆20Mar 11, 2025Updated last year
Furyton / GR-as-MVDR
View on GitHub
[SIGIR'24] Generative Retrieval as Multi-Vector Dense Retrieval
☆36Oct 18, 2024Updated last year
CURRENTF / HRN
View on GitHub
Code of Multi-Defendant Legal Judgment Prediction via Hierarchical Reasoning
☆15Apr 29, 2024Updated 2 years ago
snu-mllab / KVzip
View on GitHub
[NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)
☆225Feb 11, 2026Updated 5 months ago
nancheng58 / DebiasedSR_DRO
View on GitHub
[WSDM 2024 Best Paper Honorable Mention] Debiasing Sequential Recommenders through Distributionally Robust Optimization over System Expos…
☆16Jun 20, 2024Updated 2 years ago
ypeiyu / attribution_recalibration
View on GitHub
[ICLR 2023 Spotlight] Re-calibrating Feature Attributions for Model Interpretation
☆36Jan 21, 2026Updated 6 months ago
chaozhang-cs / hnsw-ada-ef
View on GitHub
Ada-ef (SIGMOD '26) — Adaptive efSearch for HNSW-based vector search
☆19Jun 19, 2026Updated last month
fpgasystems / Chameleon-RAG-Acceleration
View on GitHub
☆23Jun 1, 2025Updated last year
PiotrNawrot / sparse-frontier
View on GitHub
The evaluation framework for training-free sparse attention in LLMs
☆127Jan 27, 2026Updated 6 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
AIS-SNU / Smart-Infinity
View on GitHub
[HPCA'24] Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System
☆52Jul 21, 2025Updated last year
google / rago
View on GitHub
☆31Jun 22, 2025Updated last year
ByteDance-Seed / ShadowKV
View on GitHub
[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
☆311May 1, 2025Updated last year
zyqCSL / DiffKV
View on GitHub
☆45Oct 11, 2025Updated 9 months ago
UNITES-Lab / HEXA-MoE
View on GitHub
Official code for the paper "HEXA-MoE: Efficient and Heterogeneous-Aware MoE Acceleration with Zero Computation Redundancy"
☆15Mar 6, 2025Updated last year
amy-77 / ParisKV
View on GitHub
🔥 [ICML'26] ParisKV: Fast and Drift-Robust KV-Cache Retrieval for Long-Context LLMs
☆30Jun 29, 2026Updated last month
thustorage / GustANN
View on GitHub
High-Throughput, Cost-Effective Billion-Scale Vector Search with a Single GPU [SIGMOD'26]
☆30Apr 22, 2026Updated 3 months ago
HugoZHL / PQCache
View on GitHub
[SIGMOD 2025] PQCache: Product Quantization-based KVCache for Long Context LLM Inference
☆91Dec 7, 2025Updated 7 months ago
FFY0 / AdaKV
View on GitHub
The Official Implementation of Ada-KV [NeurIPS 2025]
☆139Nov 26, 2025Updated 8 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
CURRENTF / MEFT
View on GitHub
☆22Jun 11, 2024Updated 2 years ago
Infini-AI-Lab / MagicPIG
View on GitHub
[ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation
☆255Dec 16, 2024Updated last year
menik1126 / ParallelComp
View on GitHub
[ICML 2025🔥] ParallelComp: Parallel Long-Context Compressor for Length Extrapolation
☆30Jun 16, 2025Updated last year
cs-qyzhang / CEMU
View on GitHub
CEMU: Enabling Full-System Emulation of Computational Storage beyond Hardware Limits (ASPLOS'26)
☆18Dec 31, 2025Updated 6 months ago
XMUDeepLIT / QGC
View on GitHub
Code for "Retaining Key Information under High Compression Rates: Query-Guided Compressor for LLMs" (ACL 2024)
☆19Jun 12, 2024Updated 2 years ago
nancheng58 / SSL4SR
View on GitHub
[CCIR 2023] Self-supervised learning for Sequential Recommender Systems
☆25Nov 7, 2023Updated 2 years ago
nancheng58 / RecMamba
View on GitHub
Uncovering Selective State Space Model's Capabilities in Lifelong Sequential Recommendation
☆35May 8, 2024Updated 2 years ago
LZKSKY / CaSE_RISE
View on GitHub
This is the implementation of paper "Learning to Ask Conversational Questions by Optimizing Levenshtein Distance".
☆10Jul 5, 2021Updated 5 years ago
ByteDance-Seed / FlexPrefill
View on GitHub
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
☆170Oct 13, 2025Updated 9 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Leo9660 / HedraRAG_AE
View on GitHub
Artifact Evaluation for SOSP 2025
☆22Aug 16, 2025Updated 11 months ago
FFY0 / DefensiveKV
View on GitHub
Official Implementation for [ICLR26] DefensiveKV: Taming the Fragility of KV Cache Eviction in LLM Inference
☆56Mar 28, 2026Updated 4 months ago
PanZaifeng / KVFlow
View on GitHub
☆28Mar 12, 2026Updated 4 months ago
DBGroup-SUSTech / multi-vector-retrieval
View on GitHub
☆14Apr 19, 2026Updated 3 months ago
FMInference / H2O
View on GitHub
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
☆530Aug 1, 2024Updated last year
PKU-SDS-lab / POQD-ICML25
View on GitHub
☆16Aug 28, 2025Updated 11 months ago
OpenMOSS / Sparse-dLLM
View on GitHub
☆29Oct 16, 2025Updated 9 months ago