snudm-starlab / K-pruneLinks
Accurate Retraining-free Pruning for Pretrained Encoder-based Language Models (ICLR 2024)
☆13Updated 2 months ago
Alternatives and similar repositories for K-prune
Users that are interested in K-prune are comparing it to the libraries listed below
Sorting:
- SynQ: Accurate Zero-shot Quantization by Synthesis-aware Fine-tuning (ICLR 2025)☆28Updated 6 months ago
- ☆28Updated 5 months ago
- Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…☆64Updated last year
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)☆64Updated 4 months ago
- Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks☆38Updated 6 months ago
- Official Pytorch Implementation of Our Paper Accepted at ICLR 2024-- Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLM…☆49Updated last year
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs☆111Updated last month
- Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.☆111Updated last week
- ☆50Updated last year
- Compressed LLMs for Efficient Text Generation [ICLR'24 Workshop]☆85Updated 10 months ago
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆67Updated last year
- Sturctured pruning algorithm for pruning Transformer☆31Updated last year
- ☆10Updated 11 months ago
- ☆42Updated 9 months ago
- [ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference☆44Updated last year
- [AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models☆57Updated last year
- Official repository of "Distort, Distract, Decode: Instruction-Tuned Model Can Refine its Response from Noisy Instructions", ICLR 2024 Sp…☆20Updated last year
- Official code implementation for 2025 ICLR accepted paper "Dobi-SVD : Differentiable SVD for LLM Compression and Some New Perspectives"☆36Updated 4 months ago
- [COLM 2025] DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation; 知乎:https://zhuanlan.zhihu.c…☆26Updated 5 months ago
- Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models☆46Updated 9 months ago
- ☆26Updated last week
- Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference☆42Updated last year
- ☆59Updated last year
- SensiMix: Sensitivity-Aware 8-bit Index & 1-bit Value Mixed Precision Quantization for BERT Compression (PLOS One)☆34Updated 3 years ago
- LLM Inference with Microscaling Format☆25Updated 9 months ago
- [ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models☆21Updated last year
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆90Updated last month
- ☆23Updated last week
- ☆55Updated 8 months ago
- Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs☆18Updated 8 months ago