microsoft / Stochastic-Mixture-of-ExpertsLinks

This package implements THOR: Transformer with Stochastic Experts.

☆65

Alternatives and similar repositories for Stochastic-Mixture-of-Experts

Users that are interested in Stochastic-Mixture-of-Experts are comparing it to the libraries listed below

Sorting:

Hunter-DDM / stablemoe
Code for the ACL-2022 paper "StableMoE: Stable Routing Strategy for Mixture of Experts"
☆48Updated 3 years ago
dguo98 / DiffPruning
Parameter Efficient Transfer Learning with Diff Pruning
☆74Updated 4 years ago
VITA-Group / Random-MoE-as-Dropout
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆53Updated 2 years ago
asappresearch / flop
Pytorch library for factorized L0-based pruning.
☆45Updated last year
thunlp / MoEfication
☆139Updated last year
kssteven418 / LTP
[KDD'22] Learned Token Pruning for Transformers
☆98Updated 2 years ago
princeton-nlp / LM-Kernel-FT
A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643
☆78Updated last year
huggingface / block_movement_pruning
Block Sparse movement pruning
☆81Updated 4 years ago
HKUNLP / efficient-attention
[EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling
☆86Updated 2 years ago
QingruZhang / PLATON
This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).
☆46Updated 2 years ago
VITA-Group / BERT-Tickets
[NeurIPS 2020] "The Lottery Ticket Hypothesis for Pre-trained BERT Networks", Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Ya…
☆140Updated 3 years ago
varunnair18 / FISH
Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).
☆59Updated 3 years ago
rabeehk / hyperformer
☆157Updated 3 years ago
cambridgeltl / autopeft
AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning (Zhou et al.; TACL 2024)
☆46Updated last year
SimiaoZuo / MoEBERT
This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).
☆109Updated 3 years ago
teelinsan / parallel-decoding
Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"
☆119Updated last year
VITA-Group / EarlyBERT
[ACL-IJCNLP 2021] "EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets" by Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, …
☆18Updated 3 years ago
benzakenelad / BitFit
Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
☆142Updated 2 years ago
JeanKaddour / NoTrainNoGain
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆80Updated last year
Shark-NLP / CAB
☆31Updated 2 years ago
princeton-nlp / CoFiPruning
[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
☆196Updated 2 years ago
ag1988 / top_k_attention
The accompanying code for "Memory-efficient Transformers via Top-k Attention" (Ankit Gupta, Guy Dar, Shaya Goodman, David Ciprut, Jonatha…
☆69Updated 3 years ago
Noahs-ARK / RFA
☆33Updated 4 years ago
berlino / gated_linear_attention
☆106Updated last year
microsoft / SparseMixer
Sparse Backpropagation for Mixture-of-Expert Training
☆30Updated last year
sanagno / adaptively_sparse_attention
☆21Updated 2 years ago
McGill-NLP / polytropon
☆54Updated 2 years ago
Shwai-He / MEO
The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":
☆38Updated last year
ChaosCodes / ProPETL
One Network, Many Masks: Towards More Parameter-Efficient Transfer Learning
☆40Updated 2 years ago
BaohaoLiao / mefts
[NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning
☆31Updated 2 years ago