horseee/LLM-Pruner

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/horseee/LLM-Pruner)

horseee / LLM-Pruner

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.

☆1,133

Alternatives and similar repositories for LLM-Pruner

Users that are interested in LLM-Pruner are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

locuslab / wanda
View on GitHub
A simple and effective LLM pruning approach.
☆868Aug 9, 2024Updated last year
IST-DASLab / sparsegpt
View on GitHub
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
☆891Aug 20, 2024Updated last year
princeton-nlp / LLM-Shearing
View on GitHub
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
☆642Mar 4, 2024Updated 2 years ago
horseee / Awesome-Efficient-LLM
View on GitHub
A curated list for Efficient Large Language Models
☆2,024Jun 17, 2025Updated last year
CASIA-LMC-Lab / FLAP
View on GitHub
[AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models
☆76Jan 6, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
microsoft / TransformerCompression
View on GitHub
For releasing code related to compression methods for transformers, accompanying our publications
☆461Jan 16, 2025Updated last year
VainF / Torch-Pruning
View on GitHub
[CVPR 2023] DepGraph: Towards Any Structural Pruning; LLMs, Vision Foundation Models, etc.
☆3,328Sep 7, 2025Updated 10 months ago
pprp / Awesome-LLM-Prune
View on GitHub
Awesome list for LLM pruning.
☆297Oct 11, 2025Updated 9 months ago
HuangOwen / Awesome-LLM-Compression
View on GitHub
Awesome LLM compression research papers and tools.
☆1,854Jun 30, 2026Updated 3 weeks ago
horseee / LLaMA-Pruning
View on GitHub
Structural Pruning for LLaMA
☆54May 20, 2023Updated 3 years ago
mit-han-lab / smoothquant
View on GitHub
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
☆1,671Jul 12, 2024Updated 2 years ago
NVlabs / MaskLLM
View on GitHub
[NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models
☆189Jan 1, 2025Updated last year
mit-han-lab / llm-awq
View on GitHub
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
☆3,594Jul 17, 2025Updated last year
IST-DASLab / gptq
View on GitHub
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
☆2,337Mar 27, 2024Updated 2 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
luuyin / OWL
View on GitHub
Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"
☆82Jul 7, 2025Updated last year
liyunqianggyn / Awesome-LLMs-Pruning
View on GitHub
Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.
☆172May 19, 2026Updated 2 months ago
BaiTheBest / SparseLLM
View on GitHub
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
☆70Mar 27, 2025Updated last year
sramshetty / ShortGPT
View on GitHub
Unofficial implementations of block/layer-wise pruning methods for LLMs.
☆78Apr 29, 2024Updated 2 years ago
FMInference / DejaVu
View on GitHub
☆359Apr 2, 2024Updated 2 years ago
NVlabs / Minitron
View on GitHub
A family of compressed models obtained via pruning and knowledge distillation
☆384Nov 6, 2025Updated 8 months ago
FasterDecoding / Medusa
View on GitHub
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
☆2,758Jun 25, 2024Updated 2 years ago
OpenGVLab / OmniQuant
View on GitHub
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
☆903Nov 26, 2025Updated 7 months ago
arcee-ai / PruneMe
View on GitHub
Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models
☆267Apr 23, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
horseee / DeepCache
View on GitHub
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
☆970Jun 27, 2024Updated 2 years ago
EleutherAI / lm-evaluation-harness
View on GitHub
A framework for few-shot evaluation of language models.
☆13,401Jul 13, 2026Updated last week
aim-uofa / LoRAPrune
View on GitHub
☆63Dec 15, 2024Updated last year
facebookresearch / LLM-QAT
View on GitHub
Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"
☆326Mar 4, 2025Updated last year
pjlab-sys4nlp / llama-moe
View on GitHub
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
☆1,004Dec 6, 2024Updated last year
VainF / Diff-Pruning
View on GitHub
[NeurIPS 2023] Structural Pruning for Diffusion Models
☆222Jul 8, 2024Updated 2 years ago
he-y / Awesome-Pruning
View on GitHub
A curated list of neural network pruning resources.
☆2,496Apr 4, 2024Updated 2 years ago
FMInference / H2O
View on GitHub
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
☆530Aug 1, 2024Updated last year
ldery / Bonsai
View on GitHub
Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"
☆32Mar 28, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
OpenGVLab / EfficientQAT
View on GitHub
[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
☆342Apr 10, 2026Updated 3 months ago
princeton-nlp / CoFiPruning
View on GitHub
[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
☆199May 9, 2023Updated 3 years ago
SqueezeAILab / SqueezeLLM
View on GitHub
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
☆722Aug 13, 2024Updated last year
tianyic / only_train_once_personal_footprint
View on GitHub
OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators, CNN, Diffusion, LLM
☆311Sep 16, 2024Updated last year
WoosukKwon / retraining-free-pruning
View on GitHub
[NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers
☆197Feb 28, 2023Updated 3 years ago
SafeAILab / EAGLE
View on GitHub
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
☆2,478Feb 20, 2026Updated 5 months ago
AutoGPTQ / AutoGPTQ
View on GitHub
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
☆5,075Apr 11, 2025Updated last year