Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."
☆18Dec 13, 2024Updated last year
Alternatives and similar repositories for l2compress
Users that are interested in l2compress are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆23Aug 18, 2024Updated last year
- [ICLR'25] Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"☆82Nov 25, 2024Updated last year
- [ACL 2026] Repository of IPBench☆23Apr 6, 2026Updated 2 months ago
- Official Implementation of "DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucination"☆30Dec 18, 2024Updated last year
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆54Aug 6, 2025Updated 10 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- PyTorch implementation of StableMask (ICML'24)☆15Jun 27, 2024Updated last year
- Repository for the COLM 2025 paper SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths☆19Jul 10, 2025Updated 11 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆34Mar 7, 2025Updated last year
- [EMNLP'24] LongHeads: Multi-Head Attention is Secretly a Long Context Processor☆32Apr 8, 2024Updated 2 years ago
- ☆18Dec 2, 2024Updated last year
- Academic LaTeX CV template for emerging and early-career researchers☆39Mar 3, 2026Updated 3 months ago
- Repository for "Training Language Models To Explain Their Own Computations"☆22Dec 22, 2025Updated 5 months ago
- A collection of implementations of fair ML algorithms☆12Jan 7, 2018Updated 8 years ago
- SNp Exploration and Analysis using EPigenomics data☆12Jan 14, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Mixture of Lora Experts☆11Apr 7, 2024Updated 2 years ago
- [ICLR 2025🔥] D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models☆27Jul 7, 2025Updated 11 months ago
- ☆13Jul 2, 2025Updated 11 months ago
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆51Oct 18, 2024Updated last year
- ☆10Oct 28, 2024Updated last year
- Data, notebooks, and articles associated with the RSNA AI Deep Learning Lab at RSNA 2024☆14Dec 4, 2024Updated last year
- LLM KV cache compression made easy☆1,112Updated this week
- Marathon: A Multiple-choice Long Context Evaluation Benchmark for Large Language Models.☆10May 16, 2024Updated 2 years ago
- Expert-Guided Protein Language Models enable Accurate and Blazingly Fast Fitness Prediction☆17Feb 6, 2026Updated 4 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Dataset and baseline for Coling 2022 long paper (oral): "ConFiguRe: Exploring Discourse-level Chinese Figures of Speech"☆13Jul 27, 2023Updated 2 years ago
- Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals; ACL 2024☆13May 24, 2024Updated 2 years ago
- [ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference☆306May 1, 2025Updated last year
- KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches. EMNLP Findings 2024☆90Feb 27, 2025Updated last year
- Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)☆63Apr 18, 2024Updated 2 years ago
- ☆47Oct 16, 2025Updated 7 months ago
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆240Aug 2, 2024Updated last year
- Syntax Error-Free and Generalizable Tool Use for LLMs via Finite-State Decoding☆31Jan 28, 2024Updated 2 years ago
- Rationale-enhanced language models are better continual relation learners (EMNLP 2023 Main Conference)☆12Oct 11, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Source code for MutPred2.0☆16Mar 27, 2025Updated last year
- [ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection☆159Feb 20, 2025Updated last year
- Code and data for Distributional Correlation–Aware Knowledge Distillation for Stock Trading Volume Prediction (ECML-PKDD 22)☆16Sep 6, 2022Updated 3 years ago
- The code for "AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference", Qingyue Yang, Jie Wang, Xing Li, Zhihai Wang, Ch…☆29Jul 15, 2025Updated 11 months ago
- AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)☆45May 20, 2026Updated 3 weeks ago
- [ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation☆254Dec 16, 2024Updated last year
- [ACL 2024 (Oral)] A Prospector of Long-Dependency Data for Large Language Models☆60Jul 23, 2024Updated last year