Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."
☆18Dec 13, 2024Updated last year
Alternatives and similar repositories for l2compress
Users that are interested in l2compress are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 8 months ago
- [ICLR'25] Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"☆79Nov 25, 2024Updated last year
- Repository of IPBench☆20Jan 4, 2026Updated 2 months ago
- Official Implementation of "DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucination"☆28Dec 18, 2024Updated last year
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆52Aug 6, 2025Updated 7 months ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- PyTorch implementation of StableMask (ICML'24)☆15Jun 27, 2024Updated last year
- Repository for the COLM 2025 paper SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths☆17Jul 10, 2025Updated 8 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆35Mar 7, 2025Updated last year
- [EMNLP'24] LongHeads: Multi-Head Attention is Secretly a Long Context Processor☆31Apr 8, 2024Updated last year
- ☆18Dec 2, 2024Updated last year
- Repository for "Training Language Models To Explain Their Own Computations"☆21Dec 22, 2025Updated 3 months ago
- Mixture of Lora Experts☆10Apr 7, 2024Updated last year
- [ICLR 2025🔥] D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models☆27Jul 7, 2025Updated 8 months ago
- ☆13Jul 2, 2025Updated 8 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆50Oct 18, 2024Updated last year
- ☆10Oct 28, 2024Updated last year
- LLM KV cache compression made easy☆971Mar 13, 2026Updated 2 weeks ago
- Efficient retrieval head analysis with triton flash attention that supports topK probability☆13Jun 15, 2024Updated last year
- Marathon: A Multiple-choice Long Context Evaluation Benchmark for Large Language Models.☆10May 16, 2024Updated last year
- Dataset and baseline for Coling 2022 long paper (oral): "ConFiguRe: Exploring Discourse-level Chinese Figures of Speech"☆13Jul 27, 2023Updated 2 years ago
- [ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference☆286May 1, 2025Updated 10 months ago
- KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches. EMNLP Findings 2024☆89Feb 27, 2025Updated last year
- Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)☆63Apr 18, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆39Oct 16, 2025Updated 5 months ago
- ☆47Nov 25, 2024Updated last year
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆236Aug 2, 2024Updated last year
- Syntax Error-Free and Generalizable Tool Use for LLMs via Finite-State Decoding☆28Jan 28, 2024Updated 2 years ago
- Rationale-enhanced language models are better continual relation learners (EMNLP 2023 Main Conference)☆12Oct 11, 2023Updated 2 years ago
- Code and data for Distributional Correlation–Aware Knowledge Distillation for Stock Trading Volume Prediction (ECML-PKDD 22)☆14Sep 6, 2022Updated 3 years ago
- The code for "AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference", Qingyue Yang, Jie Wang, Xing Li, Zhihai Wang, Ch…☆28Jul 15, 2025Updated 8 months ago
- [ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation☆251Dec 16, 2024Updated last year
- [ACL 2024 (Oral)] A Prospector of Long-Dependency Data for Large Language Models☆60Jul 23, 2024Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)☆67Mar 27, 2025Updated last year
- A Structured Span Selector (NAACL 2022). A structured span selector with a WCFG for span selection tasks (coreference resolution, semanti…☆21Jul 11, 2022Updated 3 years ago
- Official Code Repository for the paper "Key-value memory in the brain"☆31Feb 25, 2025Updated last year
- A dashboard for exploring timm learning rate schedulers☆20Nov 22, 2024Updated last year
- [NeurIPS 2024] The official implementation of "Image Copy Detection for Diffusion Models"☆18Oct 1, 2024Updated last year
- [ICML‘2024] "LoCoCo: Dropping In Convolutions for Long Context Compression", Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen☆17Sep 7, 2024Updated last year
- Source code for Jellyfish, a soft real-time inference serving system☆15Dec 20, 2022Updated 3 years ago