liyunqianggyn/Awesome-LLMs-Pruning

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/liyunqianggyn/Awesome-LLMs-Pruning)

liyunqianggyn / Awesome-LLMs-Pruning

Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.

☆172

Alternatives and similar repositories for Awesome-LLMs-Pruning

Users that are interested in Awesome-LLMs-Pruning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

pprp / Awesome-LLM-Prune
View on GitHub
Awesome list for LLM pruning.
☆297Oct 11, 2025Updated 9 months ago
horseee / LLM-Pruner
View on GitHub
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baich…
☆1,130Oct 7, 2024Updated last year
pprp / STBLLM
View on GitHub
[ICLR25] STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
☆20Jun 3, 2025Updated last year
hrcheng1066 / awesome-pruning
View on GitHub
☆305Aug 20, 2024Updated last year
pprp / ACBench
View on GitHub
[ICML25] Agentic Compression Benchmark (ACBench)
☆17Jul 2, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
FabrizioSandri / 2SSP
View on GitHub
2SSP: A Two-Stage Framework for Structured Pruning of LLMs
☆21Aug 18, 2025Updated 11 months ago
IST-DASLab / EvoPress
View on GitHub
☆43Jun 14, 2026Updated last month
horseee / Awesome-Efficient-LLM
View on GitHub
A curated list for Efficient Large Language Models
☆2,023Jun 17, 2025Updated last year
wzhuang-xmu / LoSA
View on GitHub
[ICLR 2025] Official implementation of paper "Dynamic Low-Rank Sparse Adaptation for Large Language Models".
☆25Mar 16, 2025Updated last year
fmfi-compbio / admm-pruning
View on GitHub
☆30Jul 22, 2024Updated last year
locuslab / wanda
View on GitHub
A simple and effective LLM pruning approach.
☆868Aug 9, 2024Updated last year
ylsung / ECoFLaP
View on GitHub
Code for "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models" (ICLR 2024)
☆21Feb 16, 2024Updated 2 years ago
OpenGVLab / EfficientQAT
View on GitHub
[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
☆342Apr 10, 2026Updated 3 months ago
HuangOwen / Awesome-LLM-Compression
View on GitHub
Awesome LLM compression research papers and tools.
☆1,853Jun 30, 2026Updated 3 weeks ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
pprp / Awesome-LLM-Quantization
View on GitHub
Awesome list for LLM quantization
☆432Apr 20, 2026Updated 3 months ago
BaiTheBest / SparseLLM
View on GitHub
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
☆70Mar 27, 2025Updated last year
XIANGLONGYAN / PBS2P
View on GitHub
PyTorch code for our paper "Progressive Binarization with Semi-Structured Pruning for LLMs"
☆13Jul 11, 2026Updated last week
arcee-ai / PruneMe
View on GitHub
Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models
☆266Apr 23, 2024Updated 2 years ago
yangyifei729 / LaCo
View on GitHub
Official implementation for LaCo (EMNLP 2024 Findings)
☆22Oct 3, 2024Updated last year
shawnricecake / search-llm
View on GitHub
[NeurIPS 2024] Search for Efficient LLMs
☆16Jan 16, 2025Updated last year
CodeEval-Pro / CodeEval-Pro
View on GitHub
[ACL'25 Findings] Official repo for "HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Task"
☆40Apr 7, 2025Updated last year
A-suozhang / MixDQ
View on GitHub
[ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
☆14Nov 27, 2024Updated last year
zyxxmu / DSnoT
View on GitHub
Official Pytorch Implementation of Our Paper Accepted at ICLR 2024-- Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLM…
☆50Apr 9, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
OpenBitSys / BitDistiller
View on GitHub
[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.
☆139May 16, 2024Updated 2 years ago
aim-uofa / LoRAPrune
View on GitHub
☆63Dec 15, 2024Updated last year
ModelTC / Outlier_Suppression_Plus
View on GitHub
Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…
☆52Oct 21, 2023Updated 2 years ago
inclusionAI / MoBE
View on GitHub
Mixture-of-Basis-Experts for Compressing MoE-based LLMs
☆37Dec 24, 2025Updated 6 months ago
pixeli99 / MixLN
View on GitHub
[ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…
☆30Jul 24, 2025Updated 11 months ago
facebookresearch / ParetoQ
View on GitHub
This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"
☆131Oct 15, 2025Updated 9 months ago
haizhongzheng / LTE
View on GitHub
☆13Oct 13, 2025Updated 9 months ago
MrGGLS / BlockPruner
View on GitHub
A block pruning framework for LLMs.
☆28May 17, 2025Updated last year
Zhen-Dong / Awesome-Quantization-Papers
View on GitHub
List of papers related to neural network quantization in recent AI conferences and journals.
☆834Mar 27, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
Hsu1023 / DuQuant
View on GitHub
[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
☆186Apr 24, 2026Updated 2 months ago
SAI-Lab-NYU / QSVD
View on GitHub
This repository provides the official implementation of QSVD, a method for efficient low-rank approximation that unifies Query-Key-Value …
☆28May 16, 2026Updated 2 months ago
Kai-Liu001 / Awesome-Model-Quantization
View on GitHub
This repository contains low-bit quantization papers from 2020 to 2026 on top conference.
☆192Jun 25, 2026Updated 3 weeks ago
jiwonsong-dev / SLEB
View on GitHub
[ICML 2024] Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
☆41Feb 4, 2025Updated last year
Zishan-Shao / FlashSVD
View on GitHub
[AAAI 2026] Official implementation of "FlashSVD: Memory-Efficient Inference with Streaming for Low-Rank Models". If you find this reposi…
☆17May 1, 2026Updated 2 months ago
xvyaward / owq
View on GitHub
Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…
☆72Mar 7, 2024Updated 2 years ago
AIoT-MLSys-Lab / SVD-LLM
View on GitHub
[ICLR 2025🔥] SVD-LLM & [NAACL 2025🔥] SVD-LLM V2
☆301Aug 28, 2025Updated 10 months ago