SimiaoZuo / MoEBERTLinks

This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).

☆112

Alternatives and similar repositories for MoEBERT

Users that are interested in MoEBERT are comparing it to the libraries listed below

Sorting:

thunlp / MoEfication
☆140Updated last year
KaiLv69 / UDR
ACL'23: Unified Demonstration Retriever for In-Context Learning
☆37Updated last year
hemingkx / SpecDec
Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)
☆44Updated last year
princeton-nlp / CEPE
[ACL 2024] Long-Context Language Modeling with Parallel Encodings
☆165Updated last year
princeton-nlp / CoFiPruning
[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
☆197Updated 2 years ago
swj0419 / in-context-pretraining
☆54Updated last year
benzakenelad / BitFit
Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
☆142Updated 3 years ago
yegcjs / mixinglaws
☆106Updated 3 months ago
Hunter-DDM / stablemoe
Code for the ACL-2022 paper "StableMoE: Stable Routing Strategy for Mixture of Experts"
☆50Updated 3 years ago
jzbjyb / ReAtt
Retrieval as Attention
☆82Updated 2 years ago
QingruZhang / PLATON
This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).
☆46Updated 3 years ago
aitsc / GLMKD
Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method ; GKD: A General Knowledge Distillation…
☆32Updated 2 years ago
VITA-Group / Random-MoE-as-Dropout
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆55Updated 2 years ago
kssteven418 / LTP
[KDD'22] Learned Token Pruning for Transformers
☆100Updated 2 years ago
FranxYao / FlanT5-CoT-Specialization
Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.
☆132Updated 2 years ago
raymin0223 / fast_robust_early_exit
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)
☆64Updated last year
morningmoni / UniPELT
Code for paper "UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning", ACL 2022
☆63Updated 3 years ago
ACL2023-Retrieval-LM / ACL2023-Retrieval-LM.github.io
https://acl2023-retrieval-lm.github.io/
☆156Updated 2 years ago
thu-coai / PICL
Code for ACL2023 paper: Pre-Training to Learn in Context
☆107Updated last year
microsoft / AdaMix
This is the implementation of the paper AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning (https://arxiv.org/abs/2205.1…
☆135Updated 2 years ago
p-lambda / dsir
DSIR large-scale data selection framework for language model training
☆263Updated last year
microsoft / Stochastic-Mixture-of-Experts
This package implements THOR: Transformer with Stochastic Experts.
☆65Updated 4 years ago
AkariAsai / ATTEMPT
This is the oficial repository for "Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts" (EMNLP 2022)
☆102Updated 2 years ago
txsun1997 / Black-Box-Tuning
ICML'2022: Black-Box Tuning for Language-Model-as-a-Service & EMNLP'2022: BBTv2: Towards a Gradient-Free Future with Large Language Model…
☆272Updated 2 years ago
DRSY / EMO
[ICLR 2024]EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling(https://arxiv.org/abs/2310.04691)
☆126Updated last year
princeton-nlp / TRIME
[EMNLP 2022] Training Language Models with Memory Augmentation https://arxiv.org/abs/2205.12674
☆196Updated 2 years ago
cxcscmu / MATES
Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]
☆75Updated 11 months ago
XiangLi1999 / ContrastiveDecoding
contrastive decoding
☆202Updated 2 years ago
thunlp / Prompt-Transferability
On Transferability of Prompt Tuning for Natural Language Processing
☆100Updated last year
allenai / data-efficient-finetuning
Code for paper 'Data-Efficient FineTuning'
☆28Updated 2 years ago