LOG-postech / rethinking-LLM-pruningLinks

☆28

Alternatives and similar repositories for rethinking-LLM-pruning

Users that are interested in rethinking-LLM-pruning are comparing it to the libraries listed below

Sorting:

Nota-NetsPresso / shortened-llm
Compressed LLMs for Efficient Text Generation [ICLR'24 Workshop]
☆88Updated last year
SempraETY / Pruning-via-Merging
☆22Updated 11 months ago
raymin0223 / fast_robust_early_exit
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)
☆64Updated last year
hdong920 / GRIFFIN
☆38Updated last year
kssteven418 / BigLittleDecoder
[NeurIPS'23] Speculative Decoding with Big Little Decoder
☆94Updated last year
yxli2123 / LoSparse
☆61Updated 2 years ago
liyunqianggyn / Awesome-LLMs-Pruning
Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.
☆131Updated 3 months ago
hdong920 / LESS
☆53Updated last year
luuyin / OWL
Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"
☆73Updated 4 months ago
edong6768 / Malet
🔨 Malet (Machine Learning Experiment Tool) is a tool for efficient machine learning experiment execution, logging, analysis, and plot ma…
☆17Updated 6 months ago
JingXuTHU / Random-Masking-Finds-Winning-Tickets-for-Parameter-Efficient-Fine-tuning
☆14Updated last year
jiwonsong-dev / SLEB
[ICML 2024] Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
☆37Updated 9 months ago
biomedical-cybernetics / Relative-importance-and-activation-pruning
☆52Updated last year
zyxxmu / DSnoT
Official Pytorch Implementation of Our Paper Accepted at ICLR 2024-- Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLM…
☆50Updated last year
alvin-zyl / CoLA
Implementation of CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation
☆24Updated 8 months ago
QingruZhang / PLATON
This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).
☆46Updated 3 years ago
henryzhongsc / longctx_bench
Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…
☆86Updated 8 months ago
xvyaward / owq
Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…
☆66Updated last year
kyrie-23 / linear_task_arithmetic
☆11Updated 3 months ago
ChandlerGuan / Transkimmer
Code for ACL2022 publication Transkimmer: Transformer Learns to Layer-wise Skim
☆21Updated 3 years ago
Jingyu6 / speculative_prefill
☆44Updated 5 months ago
CASIA-IVA-Lab / FLAP
[AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models
☆62Updated last year
IST-DASLab / EvoPress
☆36Updated 3 months ago
yangyifei729 / LaCo
Official implementation for LaCo (EMNLP 2024 Findings)
☆17Updated last year
mutonix / pyramidinfer
☆48Updated 11 months ago
zyxxmu / cam
Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference
☆47Updated last year
BaiTheBest / SparseLLM
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
☆67Updated 7 months ago
hmarkc / parallel-prompt-decoding
Efficient LLM Inference Acceleration using Prompting
☆50Updated last year
mmatena / model_merging
☆78Updated 3 years ago
jongwooko / distillm-2
Official PyTorch implementation of DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs (ICML 2025 Oral)
☆46Updated 4 months ago