krafton-ai / lexicoLinks

KV cache compression via sparse coding

☆14

Alternatives and similar repositories for lexico

Users that are interested in lexico are comparing it to the libraries listed below

Sorting:

TUDa-HWAI / Basis_Sharing
☆17Updated last year
Zhengsh123 / FREE-Merging
The implementation for FREE-Merging: Fourier Transform for Model Merging with Lightweight Experts (ICCV25)
☆10Updated 4 months ago
itsdaniele / speculative_mamba
☆15Updated 11 months ago
dongwonjo / FastKV
Official Implementation of FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation
☆25Updated 5 months ago
thu-ml / Adaptive-Sparse-Trainer
Official implementation for "Pruning Large Language Models with Semi-Structural Adaptive Sparse Training" (AAAI 2025)
☆15Updated 4 months ago
UNITES-Lab / HEXA-MoE
Official code for the paper "HEXA-MoE: Efficient and Heterogeneous-Aware MoE Acceleration with Zero Computation Redundancy"
☆13Updated 8 months ago
JinaLeejnl / 2D-TPE
2D-TPE: Two-Dimensional Positional Encoding Enhances Table Understanding for Large Language Models (WWW 2025)
☆10Updated 6 months ago
OpenSparseLLMs / Linearization
☆61Updated 4 months ago
LLMkvsys / rethink-kv-compression
☆18Updated 8 months ago
AkideLiu / MiniCache
☆10Updated last year
horseee / dKV-Cache
[NeurIPS'25] dKV-Cache: The Cache for Diffusion Language Models
☆114Updated 5 months ago
hahnyuan / ASVD4LLM
Activation-aware Singular Value Decomposition for Compressing Large Language Models
☆80Updated last year
ruipeterpan / specreason
PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]
☆57Updated last month
sramshetty / mixture-of-depths
An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆36Updated last year
Jikai0Wang / OPT-Tree
☆29Updated 5 months ago
Multiverse4FM / Multiverse
☆81Updated 4 months ago
tilde-research / nsa-impl
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆124Updated 4 months ago
jiwonsong-dev / ReasoningPathCompression
[NeurIPS 2025] Official implementation of "Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning"
☆24Updated 2 weeks ago
abdelfattah-lab / TokenButler
☆26Updated 3 months ago
Lucky-Lance / Expert_Sparsity
[ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
☆108Updated last year
jiwonsong-dev / SLEB
[ICML 2024] Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
☆37Updated 9 months ago
Infini-AI-Lab / Multiverse
☆100Updated last month
ByteDance-Seed / FlexPrefill
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
☆151Updated 3 weeks ago
NimrodShabtay / LiveXiv
☆12Updated 3 months ago
IST-DASLab / QuEST
Work in progress.
☆74Updated 4 months ago
qiuzh20 / gated_attention
The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink…
☆101Updated last month
IST-DASLab / HALO
HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…
☆28Updated 8 months ago
pprp / Pruner-Zero
[ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs
☆94Updated 11 months ago
thunlp / SparsingLaw
The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".
☆27Updated 11 months ago
GATECH-EIC / Linearized-LLM
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
☆36Updated last year