yangyifei729/KVSharer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yangyifei729/KVSharer)

yangyifei729 / KVSharer

Source code of paper ''KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing''

☆31

Alternatives and similar repositories for KVSharer

Users that are interested in KVSharer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mutonix / pyramidinfer
View on GitHub
☆47Nov 25, 2024Updated last year
sail-sg / SimLayerKV
View on GitHub
The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.
☆54Oct 18, 2024Updated last year
AkideLiu / MiniCache
View on GitHub
☆14Sep 7, 2024Updated last year
AIoT-MLSys-Lab / MEDA
View on GitHub
[NAACL 2025🔥] MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
☆22Jun 19, 2025Updated last year
metacarbon / shareAtt
View on GitHub
Beyond KV Caching: Shared Attention for Efficient LLMs
☆20Jul 19, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
Linking-ai / SCOPE
View on GitHub
(ACL2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation
☆36May 28, 2025Updated last year
dongwonjo / FastKV
View on GitHub
[ACL Findings 2026] Official Implementation of "FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acc…
☆32Apr 14, 2026Updated 3 months ago
SUSTechBruce / LOOK-M
View on GitHub
[EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…
☆103Nov 9, 2024Updated last year
facebookresearch / ToMi
View on GitHub
Code accompanying our EMNLP 2019 paper: "Revisiting the Evaluation of Theory of Mind through Question Answering"
☆29Aug 9, 2020Updated 5 years ago
menik1126 / UNComp
View on GitHub
[EMNLP 2025🔥] UNComp: Can Matrix Entropy Uncover Sparsity? -- A Compressor Design from an Uncertainty-Aware Perspective
☆20Jan 7, 2026Updated 6 months ago
TerryPei / CSP
View on GitHub
Cross-Self KV Cache Pruning for Efficient Vision-Language Inference
☆10Dec 15, 2024Updated last year
chtmp223 / suri
View on GitHub
Suri: Multi-constraint instruction following for long-form text generation [EMNLP’24]
☆27Oct 3, 2025Updated 9 months ago
ElvishElvis / LCA-on-the-line
View on GitHub
LCA-on-the-line (ICML 2024 Oral)
☆14Feb 13, 2025Updated last year
li-jl16 / LORS
View on GitHub
CVPR2024 highlight.
☆13Oct 10, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
linxihui / dkernel
View on GitHub
☆22Apr 17, 2025Updated last year
mrhrifat / al-quran
View on GitHub
Al Quran is the holy book of Islam. Muslims believe that the Quran was revealed by Allah (SWT) to the final prophet & messenger, Muhammad…
☆12Apr 30, 2023Updated 3 years ago
Lyun0912-wu / LongAttn
View on GitHub
LongAttn ：Selecting Long-context Training Data via Token-level Attention
☆15Jul 16, 2025Updated last year
zepingyu0512 / arithmetic-mechanism
View on GitHub
code for EMNLP 2024 paper: Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis
☆12Nov 17, 2024Updated last year
zhuyunqi96 / LoraLPrun
View on GitHub
☆13May 21, 2023Updated 3 years ago
ChristophAlt / fewrel
View on GitHub
Few-Shot Relation Extraction with AllenNLP
☆12Jan 27, 2019Updated 7 years ago
iLearn-Lab / ACL25-PTQ1.61
View on GitHub
☆15Apr 6, 2026Updated 3 months ago
ShopeeLLM / Spec-RL
View on GitHub
SPEC-RL: Accelerating On-Policy Reinforcement Learning via Speculative Rollouts
☆66Dec 1, 2025Updated 7 months ago
hpcgroup / loki
View on GitHub
Algorithms for approximate attention in LLMs
☆22Apr 14, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
66RING / CritiPrefill
View on GitHub
Code repo for "CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs".
☆17Sep 15, 2024Updated last year
mlbio-epfl / joint-inference
View on GitHub
[ICLR 2025] Large (Vision) Language Models are Unsupervised In-Context Learners
☆22Jun 6, 2025Updated last year
ablghtianyi / ICL_Modular_Arithmetic
View on GitHub
☆19Mar 25, 2025Updated last year
Zanette-Labs / SpeculativeRejection
View on GitHub
[NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection
☆56Oct 29, 2024Updated last year
VITA-Group / llm-kick
View on GitHub
[ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.
☆27Apr 21, 2025Updated last year
letsgoLakers / NCIFD
View on GitHub
面向大模型的民族文化数据集
☆13May 26, 2025Updated last year
AI9Stars / SpecMQuant
View on GitHub
Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework Design
☆23May 29, 2025Updated last year
FMInference / H2O
View on GitHub
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
☆530Aug 1, 2024Updated last year
Adaxry / Unified_Layer_Skipping
View on GitHub
☆15Apr 11, 2024Updated 2 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
cchao0116 / CTSMA-ICML21
View on GitHub
Code for ICML21 paper "Learning Self-Modulating Attention in Continuous Time Space with Applications to Sequential Recommendation"
☆13Feb 8, 2023Updated 3 years ago
XiaoyuanXie / xiaoyuanxie.github.io
View on GitHub
Personal Page
☆12Jul 4, 2026Updated 3 weeks ago
init0xyz / AdaCQR
View on GitHub
Implementation of AdaCQR(COLING 2025)
☆15Dec 30, 2024Updated last year
JoakimHaurum / ATC
View on GitHub
Official PyTorch implementation of Agglomerative Token Clustering presented at ECCV 2024
☆20Sep 19, 2024Updated last year
tim-roderick / VST
View on GitHub
Video Summarization Transformer: Implementation in PyTorch of the Transformer model for video summarisation
☆10Oct 27, 2020Updated 5 years ago
yale-nlp / refdpo
View on GitHub
☆16Jul 23, 2024Updated 2 years ago
ylsung / rsq
View on GitHub
Code for "RSQ: Learning from Important Tokens Leads to Better Quantized LLMs"
☆23Mar 25, 2026Updated 4 months ago