krafton-ai / lexicoLinks
KV cache compression via sparse coding
☆14Updated last month
Alternatives and similar repositories for lexico
Users that are interested in lexico are comparing it to the libraries listed below
Sorting:
- ☆18Updated last year
- ☆11Updated 11 months ago
- The implementation for FREE-Merging: Fourier Transform for Model Merging with Lightweight Experts (ICCV25)☆10Updated 5 months ago
- ☆15Updated last year
- Official implementation for "Pruning Large Language Models with Semi-Structural Adaptive Sparse Training" (AAAI 2025)☆15Updated 4 months ago
- Official code for the paper "HEXA-MoE: Efficient and Heterogeneous-Aware MoE Acceleration with Zero Computation Redundancy"☆13Updated 8 months ago
- Official Implementation of FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration☆27Updated last week
- ☆20Updated 8 months ago
- 2D-TPE: Two-Dimensional Positional Encoding Enhances Table Understanding for Large Language Models (WWW 2025)☆10Updated 7 months ago
- ☆61Updated 4 months ago
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆126Updated 5 months ago
- Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training☆32Updated 5 months ago
- ☆10Updated last year
- [ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference☆54Updated last year
- [NeurIPS 2025] Official implementation of "Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning"☆25Updated last month
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)☆67Updated 8 months ago
- ☆26Updated this week
- [NeurIPS'25] The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Tok…☆59Updated 2 weeks ago
- ☆12Updated 4 months ago
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆27Updated last year
- ☆29Updated 6 months ago
- Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More☆33Updated 6 months ago
- ☆19Updated last year
- [ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs☆96Updated last year
- [ICLR‘24 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆99Updated 5 months ago
- This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs☆41Updated last year
- The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink…☆108Updated 2 months ago
- AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)☆30Updated last month
- Flash-Linear-Attention models beyond language☆20Updated 3 months ago
- Work in progress.☆75Updated 5 months ago