xKV: Cross-Layer SVD for KV-Cache Compression
☆45Nov 30, 2025Updated 3 months ago
Alternatives and similar repositories for xKV
Users that are interested in xKV are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection☆155Feb 20, 2025Updated last year
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆50Oct 18, 2024Updated last year
- ☆27Nov 25, 2025Updated 4 months ago
- ☆47Nov 25, 2024Updated last year
- ☆20Jun 1, 2025Updated 9 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- The official repository of Quamba1 [ICLR 2025] & Quamba2 [ICML 2025]☆68Jun 19, 2025Updated 9 months ago
- The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"☆22Apr 22, 2025Updated 11 months ago
- SLiM: One-shot Quantized Sparse Plus Low-rank Approximation of LLMs (ICML 2025)☆35Nov 28, 2025Updated 3 months ago
- The official implementation of the DAC 2024 paper GQA-LUT☆21Dec 20, 2024Updated last year
- [COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models☆24Oct 5, 2024Updated last year
- [ICML'25] "Rethinking Addressing in Language Models via Contextualized Equivariant Positional Encoding" by Jiajun Zhu, Peihao Wang, Ruisi…☆14Jun 6, 2025Updated 9 months ago
- Universal data IO and neural network modules in NLP tasks.☆18Jun 21, 2022Updated 3 years ago
- Pruning methods for pytorch with an optimizer-like interface☆15Apr 14, 2020Updated 5 years ago
- ☆46Sep 27, 2025Updated 5 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆363Nov 20, 2025Updated 4 months ago
- Codebase for ICML'24 paper: Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs☆27Jun 25, 2024Updated last year
- Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…☆628Sep 11, 2024Updated last year
- ☆20Oct 13, 2024Updated last year
- [ICLR 2025] RaSA: Rank-Sharing Low-Rank Adaptation☆10May 19, 2025Updated 10 months ago
- ☆13Nov 29, 2024Updated last year
- ☆17Feb 3, 2023Updated 3 years ago
- ☆12Jul 4, 2020Updated 5 years ago
- AnyDSL traversal code☆15Feb 18, 2019Updated 7 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆15Apr 11, 2024Updated last year
- Multi-Layer Key-Value sharing experiments on Pythia models☆34Jun 14, 2024Updated last year
- Optimize the order of execution for tf.einsum☆13May 31, 2017Updated 8 years ago
- Command helper for slurm system. Act as if you are on compute node.☆15Feb 1, 2025Updated last year
- ☆166Jun 22, 2025Updated 9 months ago
- ECNU NLP group learns CS224n in the form of seminars in the 2017 summer.☆10Aug 12, 2017Updated 8 years ago
- ☆16Jul 23, 2024Updated last year
- Template for Makefile based SysY compiler projects.☆11Jun 16, 2022Updated 3 years ago
- ☆14Jan 24, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Compressing Large Language Models using Low Precision and Low Rank Decomposition☆107Nov 24, 2025Updated 4 months ago
- I love Database.☆19Dec 25, 2024Updated last year
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 5 months ago
- NLPCC-2025 Shared-Task 1: LLM-Generated Text Detection☆15May 19, 2025Updated 10 months ago
- PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]☆67Oct 2, 2025Updated 5 months ago
- An extention of pytorch for low precision training / inference☆10Aug 28, 2023Updated 2 years ago
- [IJCAI'19] Code for "Self-attentive Biaffine Dependency Parsing"☆16Jun 13, 2019Updated 6 years ago