henryzhongsc / longctx_bench
View external linksLinks

KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches. EMNLP Findings 2024

☆88

Alternatives and similar repositories for longctx_bench

Users that are interested in longctx_bench are comparing it to the libraries listed below

Sorting:

guanchuwang / Taylor-Unswift
View on GitHub
☆22Oct 3, 2024Updated last year
aladinggit / RDMI
View on GitHub
This is the repo for remote direct memory introspection.
☆23Jun 21, 2023Updated 2 years ago
jy-yuan / KIVI
View on GitHub
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
☆356Nov 20, 2025Updated 2 months ago
Leooyii / LCEG
View on GitHub
Long Context Extension and Generalization in LLMs
☆62Sep 21, 2024Updated last year
datamllab / ltsm
View on GitHub
LTSM-Bundle: A Toolbox and Benchmark on Large Language Models for Time Series Forecasting
☆106Sep 8, 2025Updated 5 months ago
October2001 / Awesome-KV-Cache-Compression
View on GitHub
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
☆658Sep 30, 2025Updated 4 months ago
FasterDecoding / SnapKV
View on GitHub
☆302Jul 10, 2025Updated 7 months ago
ZichengXu / Decoding-Tree-Sketching
View on GitHub
Decoding Tree Sketching (DTS): a training-free decoding framework for structured multi-trajectory exploration and shortest trajectory sel…
☆64Updated this week
IBM / selective-dense-state-space-model
View on GitHub
Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on …
☆14Sep 18, 2025Updated 4 months ago
mit-han-lab / Quest
View on GitHub
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
☆372Jul 10, 2025Updated 7 months ago
SqueezeAILab / SqueezedAttention
View on GitHub
[ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference
☆56Nov 20, 2024Updated last year
snu-mllab / Context-Memory
View on GitHub
Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)
☆62Apr 18, 2024Updated last year
kazuki-irie / kv-memory-brain
View on GitHub
Official Code Repository for the paper "Key-value memory in the brain"
☆31Feb 25, 2025Updated 11 months ago
lemyx / tilelang-dsa
View on GitHub
DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang
☆43Nov 19, 2025Updated 2 months ago
assafbk / DeciMamba
View on GitHub
DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)
☆32Apr 9, 2025Updated 10 months ago
PKU-ML / LongPPL
View on GitHub
Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"
☆109Oct 11, 2025Updated 4 months ago
AlibabaPAI / FLASHNN
View on GitHub
☆105Sep 9, 2024Updated last year
FFY0 / AdaKV
View on GitHub
The Official Implementation of Ada-KV [NeurIPS 2025]
☆129Nov 26, 2025Updated 2 months ago
sail-sg / SimLayerKV
View on GitHub
The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.
☆52Oct 18, 2024Updated last year
Zanette-Labs / SpeculativeRejection
View on GitHub
[NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection
☆55Oct 29, 2024Updated last year
alessiodevoto / l2compress
View on GitHub
Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."
☆18Dec 13, 2024Updated last year
nightdessert / Retrieval_Head
View on GitHub
open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality
☆231Aug 2, 2024Updated last year
test-time-training / ttt-tk
View on GitHub
☆44Nov 1, 2025Updated 3 months ago
Infini-AI-Lab / MagicPIG
View on GitHub
[ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation
☆249Dec 16, 2024Updated last year
IsaacRe / vllm-kvcompress
View on GitHub
KV cache compression for high-throughput LLM inference
☆154Feb 5, 2025Updated last year
THUDM / LongBench
View on GitHub
LongBench v2 and LongBench (ACL 25'&24')
☆1,093Jan 15, 2025Updated last year
microsoft / MInference
View on GitHub
[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention…
☆1,183Sep 30, 2025Updated 4 months ago
johanwind / wind_rwkv
View on GitHub
☆27Jul 28, 2025Updated 6 months ago
AIoT-MLSys-Lab / D2O
View on GitHub
[ICLR 2025🔥] D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
☆27Jul 7, 2025Updated 7 months ago
Doraemonzzz / xmixers
View on GitHub
Xmixers: A collection of SOTA efficient token/channel mixers
☆28Sep 4, 2025Updated 5 months ago
uservan / ThinkPO
View on GitHub
☆17Aug 1, 2025Updated 6 months ago
emalach / LinearLM
View on GitHub
Code for the paper: https://arxiv.org/pdf/2309.06979.pdf
☆21Jul 29, 2024Updated last year
shreyansh26 / Attention-Mask-Patterns
View on GitHub
Using FlexAttention to compute attention with different masking patterns
☆47Sep 22, 2024Updated last year
FMInference / H2O
View on GitHub
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
☆502Aug 1, 2024Updated last year
srush / mamba-primer
View on GitHub
☆39Apr 5, 2024Updated last year
lzhangbv / acpsgd
View on GitHub
[ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning
☆10Apr 28, 2023Updated 2 years ago
ynchuang / DiscoverPath
View on GitHub
DiscoverPath, a KG-based re- trieval system designed for biomedical research. This system aims to assist biomedical researchers in dynami…
☆28Oct 25, 2023Updated 2 years ago
00ffcc / chunkRWKV6
View on GitHub
continous batching and parallel acceleration for RWKV6
☆22Jun 28, 2024Updated last year
princeton-nlp / ProLong
View on GitHub
Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"
☆246Sep 12, 2025Updated 5 months ago

henryzhongsc / longctx_benchView external linksLinks

Alternatives and similar repositories for longctx_bench

henryzhongsc / longctx_bench
View external linksLinks