pprp / Awesome-LLM-PruneView external linksLinks
Awesome list for LLM pruning.
☆282Oct 11, 2025Updated 4 months ago
Alternatives and similar repositories for Awesome-LLM-Prune
Users that are interested in Awesome-LLM-Prune are comparing it to the libraries listed below
Sorting:
- Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.☆148Aug 8, 2025Updated 6 months ago
- [ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs☆98Nov 25, 2024Updated last year
- A simple and effective LLM pruning approach.☆848Aug 9, 2024Updated last year
- [NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baich…☆1,106Oct 7, 2024Updated last year
- [AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models☆69Jan 6, 2024Updated 2 years ago
- ☆56Jun 10, 2024Updated last year
- Awesome list for LLM quantization☆390Oct 11, 2025Updated 4 months ago
- Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"☆82Jul 7, 2025Updated 7 months ago
- [ICLR 2025] Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better☆16Feb 15, 2025Updated last year
- A curated list for Efficient Large Language Models☆1,950Jun 17, 2025Updated 7 months ago
- Are gradient information useful for pruning of LLMs?☆47Aug 23, 2025Updated 5 months ago
- Unofficial implementations of block/layer-wise pruning methods for LLMs.☆77Apr 29, 2024Updated last year
- Compressed LLMs for Efficient Text Generation [ICLR'24 Workshop]☆90Sep 13, 2024Updated last year
- Awesome LLM compression research papers and tools.☆1,776Nov 10, 2025Updated 3 months ago
- Code for "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models" (ICLR 2024)☆20Feb 16, 2024Updated 2 years ago
- Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".☆871Aug 20, 2024Updated last year
- SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs☆61Mar 25, 2025Updated 10 months ago
- [ICML 2024] Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks☆39Feb 4, 2025Updated last year
- [ICLR 2025🔥] SVD-LLM & [NAACL 2025🔥] SVD-LLM V2☆281Aug 28, 2025Updated 5 months ago
- [NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models☆187Jan 1, 2025Updated last year
- PyTorch code for our paper "Progressive Binarization with Semi-Structured Pruning for LLMs"☆13Sep 28, 2025Updated 4 months ago
- ☆30Jul 22, 2024Updated last year
- Activation-aware Singular Value Decomposition for Compressing Large Language Models☆89Oct 22, 2024Updated last year
- [ICLR 2025] Official implementation of paper "Dynamic Low-Rank Sparse Adaptation for Large Language Models".☆23Mar 16, 2025Updated 11 months ago
- [ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆640Mar 4, 2024Updated last year
- Github Repo for OATS: Outlier-Aware Pruning through Sparse and Low Rank Decomposition☆17Apr 16, 2025Updated 10 months ago
- This is the official repo for "Differentiable Model Scaling using Differentiable Topk"☆11May 16, 2024Updated last year
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)☆67Mar 27, 2025Updated 10 months ago
- D^2-MoE: Delta Decompression for MoE-based LLMs Compression☆72Mar 25, 2025Updated 10 months ago
- ☆40Nov 22, 2025Updated 2 months ago
- Learnable Semi-structured Sparsity for Vision Transformers and Diffusion Transformers☆14Feb 7, 2025Updated last year
- [ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization☆14Nov 27, 2024Updated last year
- [ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.☆27Apr 21, 2025Updated 9 months ago
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆30Mar 28, 2024Updated last year
- [NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.☆180Oct 3, 2024Updated last year
- [TMLR 2024] Efficient Large Language Models: A Survey☆1,253Jun 23, 2025Updated 7 months ago
- Multi-Candidate Speculative Decoding☆39Apr 22, 2024Updated last year
- ☆28Feb 21, 2025Updated 11 months ago
- [ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"☆211Nov 25, 2025Updated 2 months ago