kssteven418 / LTPLinks

[KDD'22] Learned Token Pruning for Transformers

☆98

Alternatives and similar repositories for LTP

Users that are interested in LTP are comparing it to the libraries listed below

Sorting:

princeton-nlp / CoFiPruning
[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
☆196Updated 2 years ago
huggingface / block_movement_pruning
Block Sparse movement pruning
☆81Updated 4 years ago
teelinsan / parallel-decoding
Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"
☆119Updated last year
microsoft / Stochastic-Mixture-of-Experts
This package implements THOR: Transformer with Stochastic Experts.
☆65Updated 3 years ago
kssteven418 / BigLittleDecoder
[NeurIPS'23] Speculative Decoding with Big Little Decoder
☆93Updated last year
HKUNLP / efficient-attention
[EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling
☆86Updated 2 years ago
SimiaoZuo / MoEBERT
This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).
☆109Updated 3 years ago
yxli2123 / LoSparse
☆59Updated last year
Hunter-DDM / stablemoe
Code for the ACL-2022 paper "StableMoE: Stable Routing Strategy for Mixture of Experts"
☆48Updated 3 years ago
thunlp / MoEfication
☆139Updated last year
QingruZhang / PLATON
This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).
☆46Updated 2 years ago
WoosukKwon / retraining-free-pruning
[NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers
☆191Updated 2 years ago
VITA-Group / Random-MoE-as-Dropout
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆53Updated 2 years ago
berlino / gated_linear_attention
☆106Updated last year
CASIA-IVA-Lab / FLAP
[AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models
☆57Updated last year
mitchellgordon95 / bert-prune
☆17Updated 5 years ago
txsun1997 / awesome-early-exiting
A curated list of Early Exiting papers, benchmarks, and misc.
☆117Updated last year
benzakenelad / BitFit
Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
☆142Updated 2 years ago
sanagno / adaptively_sparse_attention
☆21Updated 2 years ago
IBM / PoWER-BERT
Method to improve inference time for BERT. This is an implementation of the paper titled "PoWER-BERT: Accelerating BERT Inference via Pro…
☆61Updated 2 months ago
hdong920 / LESS
☆50Updated last year
ChandlerGuan / Transkimmer
Code for ACL2022 publication Transkimmer: Transformer Learns to Layer-wise Skim
☆21Updated 2 years ago
rycolab / differentiable-subset-pruning
☆15Updated 3 years ago
dguo98 / DiffPruning
Parameter Efficient Transfer Learning with Diff Pruning
☆74Updated 4 years ago
han-shi / SparseBERT
☆13Updated 2 years ago
epfml / dynamic-sparse-flash-attention
☆147Updated 2 years ago
ziplab / QLLM
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…
☆29Updated last year
raymin0223 / fast_robust_early_exit
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)
☆61Updated 10 months ago
ruikangliu / IntactKV
[ACL 2024] Official PyTorch implementation of "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact"
☆46Updated last year
HazyResearch / fly
☆210Updated 2 years ago