pprp / Awesome-LLM-PruneLinks

Awesome list for LLM pruning.

☆267

Alternatives and similar repositories for Awesome-LLM-Prune

Users that are interested in Awesome-LLM-Prune are comparing it to the libraries listed below

Sorting:

liyunqianggyn / Awesome-LLMs-Pruning
Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.
☆129Updated 2 months ago
pprp / Awesome-LLM-Quantization
Awesome list for LLM quantization
☆326Updated 2 weeks ago
hemingkx / Spec-Bench
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
☆320Updated 6 months ago
dilab-zju / self-speculative-decoding
Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
☆205Updated 8 months ago
biomedical-cybernetics / Relative-importance-and-activation-pruning
☆51Updated last year
October2001 / Awesome-KV-Cache-Compression
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
☆566Updated 3 weeks ago
jy-yuan / KIVI
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
☆329Updated last month
smart-lty / ParallelSpeculativeDecoding
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆120Updated 6 months ago
CASIA-IVA-Lab / FLAP
[AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models
☆61Updated last year
FasterDecoding / SnapKV
☆284Updated 3 months ago
luuyin / OWL
Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"
☆72Updated 3 months ago
tiingweii-shii / Awesome-Resource-Efficient-LLM-Papers
a curated list of high-quality papers on resource-efficient LLMs 🌱
☆141Updated 7 months ago
FMInference / H2O
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
☆482Updated last year
hemingkx / SpeculativeDecodingPapers
📰 Must-read papers and blogs on Speculative Decoding ⚡️
☆988Updated this week
Hsu1023 / DuQuant
[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
☆171Updated last year
Zefan-Cai / Awesome-LLM-KV-Cache
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.
☆376Updated 7 months ago
FMInference / DejaVu
☆343Updated last year
HArmonizedSS / HASS
Official Implementation of "Learning Harmonized Representations for Speculative Sampling" (HASS)
☆47Updated 7 months ago
thu-nics / qllm-eval
Code Repository of Evaluating Quantized Large Language Models
☆133Updated last year
facebookresearch / LLM-QAT
Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"
☆317Updated 7 months ago
FFY0 / AdaKV
The Official Implementation of Ada-KV [NeurIPS 2025]
☆106Updated last month
RUCKBReasoning / LLM-Streamline
Official implementation of the ICLR paper "Streamlining Redundant Layers to Compress Large Language Models"
☆30Updated 5 months ago
TianjinYellow / EdgeDeviceLLMCompetition-Starting-Kit
☆43Updated 11 months ago
pprp / Awesome-Efficient-MoE
Efficient Mixture of Experts for LLM Paper List
☆140Updated 3 weeks ago
BaiTheBest / SparseLLM
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
☆67Updated 6 months ago
henryzhongsc / longctx_bench
Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…
☆86Updated 7 months ago
Equationliu / Kangaroo
[NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitin…
☆60Updated last year
shadowpa0327 / Palu
[ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection
☆143Updated 8 months ago
UbiquitousLearning / Efficient_Foundation_Model_Survey
Survey Paper List - Efficient LLM and Foundation Models
☆258Updated last year
mit-han-lab / Quest
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
☆338Updated 3 months ago