Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."
☆18Dec 13, 2024Updated last year
Alternatives and similar repositories for l2compress
Users that are interested in l2compress are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 9 months ago
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆23Aug 18, 2024Updated last year
- [ICLR'25] Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"☆81Nov 25, 2024Updated last year
- Repository of IPBench☆20Apr 6, 2026Updated last week
- PyTorch implementation of StableMask (ICML'24)☆15Jun 27, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆34Mar 7, 2025Updated last year
- [EMNLP'24] LongHeads: Multi-Head Attention is Secretly a Long Context Processor☆31Apr 8, 2024Updated 2 years ago
- ☆18Dec 2, 2024Updated last year
- Code repo for paper on Generative Adversarial Networks for Medical Imaging☆11Mar 20, 2020Updated 6 years ago
- Repository for "Training Language Models To Explain Their Own Computations"☆21Dec 22, 2025Updated 3 months ago
- A collection of implementations of fair ML algorithms☆12Jan 7, 2018Updated 8 years ago
- Enformer Celltyping is a tensorflow, multi-headed attention based model that incorporates distal effects of Deoxyribonucleic Acid (DNA) i…☆16Jun 25, 2025Updated 9 months ago
- SNp Exploration and Analysis using EPigenomics data☆11Jan 14, 2025Updated last year
- Mixture of Lora Experts☆10Apr 7, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ICLR 2025🔥] D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models☆27Jul 7, 2025Updated 9 months ago
- ☆13Jul 2, 2025Updated 9 months ago
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆50Oct 18, 2024Updated last year
- ☆10Oct 28, 2024Updated last year
- Data, notebooks, and articles associated with the RSNA AI Deep Learning Lab at RSNA 2024☆13Dec 4, 2024Updated last year
- LLM KV cache compression made easy☆1,021Apr 9, 2026Updated last week
- Efficient retrieval head analysis with triton flash attention that supports topK probability☆13Jun 15, 2024Updated last year
- Marathon: A Multiple-choice Long Context Evaluation Benchmark for Large Language Models.☆10May 16, 2024Updated last year
- Expert-Guided Protein Language Models enable Accurate and Blazingly Fast Fitness Prediction☆17Feb 6, 2026Updated 2 months ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals☆12May 24, 2024Updated last year
- [ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference☆291May 1, 2025Updated 11 months ago
- KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches. EMNLP Findings 2024☆90Feb 27, 2025Updated last year
- Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)☆63Apr 18, 2024Updated last year
- ☆47Nov 25, 2024Updated last year
- ☆41Oct 16, 2025Updated 6 months ago
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆237Aug 2, 2024Updated last year
- Syntax Error-Free and Generalizable Tool Use for LLMs via Finite-State Decoding☆29Jan 28, 2024Updated 2 years ago
- Rationale-enhanced language models are better continual relation learners (EMNLP 2023 Main Conference)☆12Oct 11, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)☆37Sep 30, 2025Updated 6 months ago
- Source code for MutPred2.0☆16Mar 27, 2025Updated last year
- [ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection☆154Feb 20, 2025Updated last year
- Code and data for Distributional Correlation–Aware Knowledge Distillation for Stock Trading Volume Prediction (ECML-PKDD 22)☆15Sep 6, 2022Updated 3 years ago
- The code for "AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference", Qingyue Yang, Jie Wang, Xing Li, Zhihai Wang, Ch…☆28Jul 15, 2025Updated 9 months ago
- Implementation for MICCAI DART paper: 'Detecting Melanoma Fairly: Skin Tone Detection and Debiasing for Skin Lesion Classification'☆18Jun 22, 2022Updated 3 years ago
- [ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation☆253Dec 16, 2024Updated last year