alessiodevoto / l2compressView external linksLinks
Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."
☆18Dec 13, 2024Updated last year
Alternatives and similar repositories for l2compress
Users that are interested in l2compress are comparing it to the libraries listed below
Sorting:
- Repository of IPBench☆19Jan 4, 2026Updated last month
- Official Implementation of "DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucination"☆28Dec 18, 2024Updated last year
- [ICLR'25] Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"☆78Nov 25, 2024Updated last year
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆52Aug 6, 2025Updated 6 months ago
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 6 months ago
- ☆13Jul 2, 2025Updated 7 months ago
- ☆10Oct 28, 2024Updated last year
- Repository for the COLM 2025 paper SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths☆15Jul 10, 2025Updated 7 months ago
- [EMNLP'24] LongHeads: Multi-Head Attention is Secretly a Long Context Processor☆31Apr 8, 2024Updated last year
- ☆15Apr 2, 2025Updated 10 months ago
- [NeurIPS 2024] The official implementation of "Image Copy Detection for Diffusion Models"☆18Oct 1, 2024Updated last year
- Synthesizing realistic and diverse text-datasets from augmented LLMs☆16Jan 26, 2026Updated 2 weeks ago
- PyTorch implementation of StableMask (ICML'24)☆15Jun 27, 2024Updated last year
- The official implementation of the DAC 2024 paper GQA-LUT☆20Dec 20, 2024Updated last year
- [ICML‘2024] "LoCoCo: Dropping In Convolutions for Long Context Compression", Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen☆18Sep 7, 2024Updated last year
- A no-string API framework for deploying schema-based reasoning into third-party apps☆23Updated this week
- A dashboard for exploring timm learning rate schedulers☆19Nov 22, 2024Updated last year
- The code for "AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference", Qingyue Yang, Jie Wang, Xing Li, Zhihai Wang, Ch…☆27Jul 15, 2025Updated 6 months ago
- ☆20Jan 16, 2025Updated last year
- ☆18Dec 2, 2024Updated last year
- Code for paper: Long cOntext aliGnment via efficient preference Optimization☆24Oct 10, 2025Updated 4 months ago
- [EMNLP 2024] Official implementation of "Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Ut…☆23Dec 4, 2024Updated last year
- ☆22Feb 29, 2024Updated last year
- AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)☆32Sep 30, 2025Updated 4 months ago
- A Structured Span Selector (NAACL 2022). A structured span selector with a WCFG for span selection tasks (coreference resolution, semanti…☆21Jul 11, 2022Updated 3 years ago
- Official implementation of Vector-ICL: In-context Learning with Continuous Vector Representations (ICLR 2025)☆21Jun 2, 2025Updated 8 months ago
- [ICLR 2025🔥] D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models☆27Jul 7, 2025Updated 7 months ago
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆52Oct 18, 2024Updated last year
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆23Aug 18, 2024Updated last year
- ☆56Nov 6, 2024Updated last year
- The Official Implementation of Ada-KV [NeurIPS 2025]☆126Nov 26, 2025Updated 2 months ago
- [CoLM 24] Official Repository of MambaByte: Token-free Selective State Space Model☆24Oct 12, 2024Updated last year
- [ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration☆61Feb 21, 2025Updated 11 months ago
- Syntax Error-Free and Generalizable Tool Use for LLMs via Finite-State Decoding☆28Jan 28, 2024Updated 2 years ago
- [ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference☆283May 1, 2025Updated 9 months ago
- Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)☆62Apr 18, 2024Updated last year
- Sparse and discrete interpretability tool for neural networks☆64Feb 12, 2024Updated 2 years ago
- Official Pytorch implementations for "Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation"(EC…☆33Mar 15, 2025Updated 10 months ago
- ☆36Oct 16, 2025Updated 3 months ago