Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."
☆18Dec 13, 2024Updated last year
Alternatives and similar repositories for l2compress
Users that are interested in l2compress are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆23Aug 18, 2024Updated last year
- [ICLR'25] Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"☆82Nov 25, 2024Updated last year
- Official Implementation of "DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucination"☆29Dec 18, 2024Updated last year
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆53Aug 6, 2025Updated 9 months ago
- PyTorch implementation of StableMask (ICML'24)☆15Jun 27, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Repository for the COLM 2025 paper SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths☆18Jul 10, 2025Updated 9 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆34Mar 7, 2025Updated last year
- [EMNLP'24] LongHeads: Multi-Head Attention is Secretly a Long Context Processor☆31Apr 8, 2024Updated 2 years ago
- ☆18Dec 2, 2024Updated last year
- Repository for "Training Language Models To Explain Their Own Computations"☆22Dec 22, 2025Updated 4 months ago
- Mixture of Lora Experts☆10Apr 7, 2024Updated 2 years ago
- ☆13Jul 2, 2025Updated 10 months ago
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆50Oct 18, 2024Updated last year
- ☆10Oct 28, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Efficient retrieval head analysis with triton flash attention that supports topK probability☆13Jun 15, 2024Updated last year
- LLM KV cache compression made easy☆1,055Apr 23, 2026Updated last week
- Marathon: A Multiple-choice Long Context Evaluation Benchmark for Large Language Models.☆10May 16, 2024Updated last year
- Dataset and baseline for Coling 2022 long paper (oral): "ConFiguRe: Exploring Discourse-level Chinese Figures of Speech"☆13Jul 27, 2023Updated 2 years ago
- [ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference☆296May 1, 2025Updated last year
- Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)☆63Apr 18, 2024Updated 2 years ago
- ☆47Nov 25, 2024Updated last year
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆238Aug 2, 2024Updated last year
- Syntax Error-Free and Generalizable Tool Use for LLMs via Finite-State Decoding☆30Jan 28, 2024Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Rationale-enhanced language models are better continual relation learners (EMNLP 2023 Main Conference)☆12Oct 11, 2023Updated 2 years ago
- AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)☆40Sep 30, 2025Updated 7 months ago
- [ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection☆155Feb 20, 2025Updated last year
- Code and data for Distributional Correlation–Aware Knowledge Distillation for Stock Trading Volume Prediction (ECML-PKDD 22)☆15Sep 6, 2022Updated 3 years ago
- The code for "AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference", Qingyue Yang, Jie Wang, Xing Li, Zhihai Wang, Ch…☆28Jul 15, 2025Updated 9 months ago
- IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse☆96Mar 14, 2026Updated last month
- [ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation☆253Dec 16, 2024Updated last year
- [ACL 2024 (Oral)] A Prospector of Long-Dependency Data for Large Language Models☆60Jul 23, 2024Updated last year
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)☆68Mar 27, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A Structured Span Selector (NAACL 2022). A structured span selector with a WCFG for span selection tasks (coreference resolution, semanti…☆21Jul 11, 2022Updated 3 years ago
- A dashboard for exploring timm learning rate schedulers☆20Nov 22, 2024Updated last year
- Official Code Repository for the paper "Key-value memory in the brain"☆31Feb 25, 2025Updated last year
- [ICML‘2024] "LoCoCo: Dropping In Convolutions for Long Context Compression", Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen☆17Sep 7, 2024Updated last year
- (ACL 2025) 🔥🔥🔥Code for "Empowering Multimodal Large Language Models with Evol-Instruct"☆22May 15, 2025Updated 11 months ago
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆80Jan 16, 2026Updated 3 months ago
- Official implementation of Vector-ICL: In-context Learning with Continuous Vector Representations (ICLR 2025)☆23Jun 2, 2025Updated 11 months ago