locuslab/wanda

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/locuslab/wanda)

locuslab / wanda

A simple and effective LLM pruning approach.

☆868

Alternatives and similar repositories for wanda

Users that are interested in wanda are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

IST-DASLab / sparsegpt
View on GitHub
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
☆889Aug 20, 2024Updated last year
horseee / LLM-Pruner
View on GitHub
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baich…
☆1,130Oct 7, 2024Updated last year
luuyin / OWL
View on GitHub
Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"
☆81Jul 7, 2025Updated last year
princeton-nlp / LLM-Shearing
View on GitHub
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
☆640Mar 4, 2024Updated 2 years ago
pprp / Awesome-LLM-Prune
View on GitHub
Awesome list for LLM pruning.
☆297Oct 11, 2025Updated 9 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
SqueezeAILab / SqueezeLLM
View on GitHub
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
☆722Aug 13, 2024Updated last year
microsoft / TransformerCompression
View on GitHub
For releasing code related to compression methods for transformers, accompanying our publications
☆461Jan 16, 2025Updated last year
mit-han-lab / llm-awq
View on GitHub
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
☆3,592Jul 17, 2025Updated last year
OpenGVLab / OmniQuant
View on GitHub
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
☆901Nov 26, 2025Updated 7 months ago
biomedical-cybernetics / Relative-importance-and-activation-pruning
View on GitHub
☆60Jun 10, 2024Updated 2 years ago
HuangOwen / Awesome-LLM-Compression
View on GitHub
Awesome LLM compression research papers and tools.
☆1,853Jun 30, 2026Updated 2 weeks ago
CASIA-LMC-Lab / FLAP
View on GitHub
[AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models
☆76Jan 6, 2024Updated 2 years ago
horseee / Awesome-Efficient-LLM
View on GitHub
A curated list for Efficient Large Language Models
☆2,023Jun 17, 2025Updated last year
IST-DASLab / gptq
View on GitHub
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
☆2,334Mar 27, 2024Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
zyxxmu / DSnoT
View on GitHub
Official Pytorch Implementation of Our Paper Accepted at ICLR 2024-- Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLM…
☆50Apr 9, 2024Updated 2 years ago
LinkAnonymous / BESA
View on GitHub
☆12Oct 9, 2023Updated 2 years ago
Vahe1994 / SpQR
View on GitHub
☆554Feb 8, 2026Updated 5 months ago
mit-han-lab / smoothquant
View on GitHub
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
☆1,670Jul 12, 2024Updated 2 years ago
BaiTheBest / SparseLLM
View on GitHub
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
☆70Mar 27, 2025Updated last year
pprp / Pruner-Zero
View on GitHub
[ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs
☆100Nov 25, 2024Updated last year
OpenGVLab / LLMPrune-BESA
View on GitHub
BESA is a differentiable weight pruning technique for large language models.
☆17Mar 4, 2024Updated 2 years ago
NVlabs / MaskLLM
View on GitHub
[NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models
☆189Jan 1, 2025Updated last year
aim-uofa / LoRAPrune
View on GitHub
☆63Dec 15, 2024Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
sramshetty / ShortGPT
View on GitHub
Unofficial implementations of block/layer-wise pruning methods for LLMs.
☆78Apr 29, 2024Updated 2 years ago
hahnyuan / RPTQ4LLM
View on GitHub
Reorder-based post-training quantization for large language model
☆199May 17, 2023Updated 3 years ago
MrGGLS / BlockPruner
View on GitHub
A block pruning framework for LLMs.
☆28May 17, 2025Updated last year
Cornell-RelaxML / QuIP
View on GitHub
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
☆399Feb 24, 2024Updated 2 years ago
FMInference / DejaVu
View on GitHub
☆359Apr 2, 2024Updated 2 years ago
princeton-nlp / MeZO
View on GitHub
[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333
☆1,168Jan 11, 2024Updated 2 years ago
yangyifei729 / LaCo
View on GitHub
Official implementation for LaCo (EMNLP 2024 Findings)
☆22Oct 3, 2024Updated last year
Aaronhuang-778 / BiLLM
View on GitHub
[ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
☆235Jan 11, 2025Updated last year
WoosukKwon / retraining-free-pruning
View on GitHub
[NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers
☆197Feb 28, 2023Updated 3 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
HanGuo97 / lq-lora
View on GitHub
☆129Jan 22, 2024Updated 2 years ago
EleutherAI / lm-evaluation-harness
View on GitHub
A framework for few-shot evaluation of language models.
☆13,340Jul 13, 2026Updated last week
arcee-ai / PruneMe
View on GitHub
Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models
☆266Apr 23, 2024Updated 2 years ago
yuhuixu1993 / qa-lora
View on GitHub
Official PyTorch implementation of QA-LoRA
☆147Mar 13, 2024Updated 2 years ago
OpenLMLab / LOMO
View on GitHub
LOMO: LOw-Memory Optimization
☆994Jul 2, 2024Updated 2 years ago
facebookresearch / LLM-QAT
View on GitHub
Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"
☆325Mar 4, 2025Updated last year
IST-DASLab / OBC
View on GitHub
Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".
☆132Jul 11, 2023Updated 3 years ago