krafton-ai / lexicoLinks
KV cache compression via sparse coding
☆14Updated 5 months ago
Alternatives and similar repositories for lexico
Users that are interested in lexico are comparing it to the libraries listed below
Sorting:
- The implementation for FREE-Merging: Fourier Transform for Model Merging with Lightweight Experts (ICCV25)☆10Updated 3 months ago
- Official Implementation of FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation☆24Updated 5 months ago
- ☆16Updated last year
- Official implementation for "Pruning Large Language Models with Semi-Structural Adaptive Sparse Training" (AAAI 2025)☆15Updated 3 months ago
- Official code for the paper "HEXA-MoE: Efficient and Heterogeneous-Aware MoE Acceleration with Zero Computation Redundancy"☆13Updated 7 months ago
- ☆18Updated 7 months ago
- ☆15Updated 10 months ago
- ☆10Updated last year
- 2D-TPE: Two-Dimensional Positional Encoding Enhances Table Understanding for Large Language Models (WWW 2025)☆10Updated 6 months ago
- [ICML 2025] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models☆44Updated last year
- ☆12Updated 3 months ago
- [ICLR 2025] Mixture Compressor for Mixture-of-Experts LLMs Gains More☆57Updated 8 months ago
- [ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference☆54Updated 10 months ago
- Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training☆30Updated 3 months ago
- The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".☆78Updated 7 months ago
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)☆66Updated 6 months ago
- ☆14Updated 11 months ago
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…☆23Updated 8 months ago
- [NeurIPS'25] The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Tok…☆51Updated 2 weeks ago
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆38Updated last year
- Beyond KV Caching: Shared Attention for Efficient LLMs☆19Updated last year
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Updated last year
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆119Updated 3 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆108Updated this week
- Work in progress.☆74Updated 3 months ago
- ☆14Updated last year
- [EMNLP 2024] Quantize LLM to extremely low-bit, and finetune the quantized LLMs☆14Updated last year
- ☆15Updated 4 months ago
- [ICLR‘24 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆95Updated 3 months ago
- [ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity☆58Updated 3 months ago