Source code of paper ''KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing''
☆31Oct 24, 2024Updated last year
Alternatives and similar repositories for KVSharer
Users that are interested in KVSharer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆47Nov 25, 2024Updated last year
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆50Oct 18, 2024Updated last year
- ☆11Sep 7, 2024Updated last year
- Official Implementation of FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration☆30Nov 22, 2025Updated 4 months ago
- (ACL 2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation☆34May 28, 2025Updated 10 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Jul 19, 2024Updated last year
- [NAACL 2025🔥] MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference☆18Jun 19, 2025Updated 9 months ago
- [EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…☆104Nov 9, 2024Updated last year
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 8 months ago
- Cross-Self KV Cache Pruning for Efficient Vision-Language Inference☆10Dec 15, 2024Updated last year
- LCA-on-the-line (ICML 2024 Oral)☆14Feb 13, 2025Updated last year
- ☆20Oct 13, 2024Updated last year
- [ICLR 2025] Large (Vision) Language Models are Unsupervised In-Context Learners☆22Jun 6, 2025Updated 9 months ago
- Suri: Multi-constraint instruction following for long-form text generation (EMNLP’24)☆27Oct 3, 2025Updated 5 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- This is the open-source code for TokenCarve.☆26Jan 23, 2026Updated 2 months ago
- CVPR2024 highlight.☆13Oct 10, 2024Updated last year
- FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient LLM Reasoning (EMNLP 2025)☆58Oct 10, 2025Updated 5 months ago
- ☆22Apr 17, 2025Updated 11 months ago
- ☆35Jun 3, 2025Updated 9 months ago
- ☆84Oct 9, 2024Updated last year
- Repository of IPBench☆20Jan 4, 2026Updated 2 months ago
- Personal Page☆12Mar 20, 2026Updated last week
- This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"☆55Jul 16, 2024Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- code for EMNLP 2024 paper: Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis☆12Nov 17, 2024Updated last year
- Algorithms for approximate attention in LLMs☆22Apr 14, 2025Updated 11 months ago
- Few-Shot Relation Extraction with AllenNLP☆12Jan 27, 2019Updated 7 years ago
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection☆55Oct 29, 2024Updated last year
- 面向大模型的民族文化数据集☆12May 26, 2025Updated 10 months ago
- Code repo for "CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs".☆16Sep 15, 2024Updated last year
- [WSDM 2026] LookAhead Tuning: Safer Language Models via Partial Answer Previews☆17Dec 14, 2025Updated 3 months ago
- APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding☆14Jul 22, 2024Updated last year
- ☆15Apr 11, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆21Oct 2, 2024Updated last year
- [ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.☆27Apr 21, 2025Updated 11 months ago
- [NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.☆510Aug 1, 2024Updated last year
- ☆16May 22, 2025Updated 10 months ago
- Implementation of AdaCQR(COLING 2025)☆14Dec 30, 2024Updated last year
- ☆19Mar 25, 2025Updated last year
- Official PyTorch implementation of Agglomerative Token Clustering presented at ECCV 2024☆20Sep 19, 2024Updated last year