Hunter-DDM / stablemoeLinks

Code for the ACL-2022 paper "StableMoE: Stable Routing Strategy for Mixture of Experts"

☆50

Alternatives and similar repositories for stablemoe

Users that are interested in stablemoe are comparing it to the libraries listed below

Sorting:

microsoft / Stochastic-Mixture-of-Experts
This package implements THOR: Transformer with Stochastic Experts.
☆65Updated 4 years ago
llyx97 / sparse-and-robust-PLM
[NeurIPS 2022] "A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models", Yuanxin Liu, Fandong Meng, Zheng Lin, Jiangnan Li…
☆21Updated last year
Shwai-He / MEO
The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":
☆38Updated last year
thunlp / MoEfication
☆140Updated last year
Shark-NLP / CAB
☆31Updated 2 years ago
swj0419 / in-context-pretraining
☆54Updated last year
llyx97 / TAMT
[NAACL 2022] "Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training", Yuanxin Liu, Fandong Meng, Zheng Lin, Pe…
☆15Updated 3 years ago
kernelmachine / demix
DEMix Layers for Modular Language Modeling
☆54Updated 4 years ago
lancopku / MUKI
[Findings of EMNLP22] From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models
☆19Updated 2 years ago
princeton-nlp / DinkyTrain
Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃
☆114Updated 2 years ago
yizhongw / llm-temporal-alignment
Methods and evaluation for aligning language models temporally
☆30Updated last year
McGill-NLP / polytropon
☆54Updated 2 years ago
ChaosCodes / ProPETL
One Network, Many Masks: Towards More Parameter-Efficient Transfer Learning
☆40Updated 2 years ago
princeton-nlp / LM-Kernel-FT
A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643
☆78Updated 2 years ago
thunlp / Intrinsic-Prompt-Tuning
☆18Updated 2 years ago
allenai / hyperdecoders
Codebase for Hyperdecoders https://arxiv.org/abs/2203.08304
☆13Updated 3 years ago
nayeon7lee / FactualityPrompt
☆86Updated 2 years ago
lancopku / DynamicKD
Code for EMNLP 2021 main conference paper "Dynamic Knowledge Distillation for Pre-trained Language Models"
☆41Updated 3 years ago
RZFan525 / Awesome-ScalingLaws
A curated list of awesome resources dedicated to Scaling Laws for LLMs
☆79Updated 2 years ago
AkariAsai / ATTEMPT
This is the oficial repository for "Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts" (EMNLP 2022)
☆102Updated 2 years ago
ptlmasking / maskbert
☆20Updated 4 years ago
pkunlp-icler / ChildTuning
☆33Updated 4 years ago
FranxYao / FlanT5-CoT-Specialization
Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.
☆132Updated 2 years ago
sail-sg / symbolic-instruction-tuning
The official repository for the paper "From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning".
☆66Updated 2 years ago
Victorwz / VaLM
VaLM: Visually-augmented Language Modeling. ICLR 2023.
☆56Updated 2 years ago
SALT-NLP / Adaptive-Compositional-Modules
Code for the ACL 2022 paper "Continual Sequence Generation with Adaptive Compositional Modules"
☆39Updated 3 years ago
gmftbyGMFTBY / Rep-Dropout
[NeurIPS 2023] Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective
☆36Updated 2 years ago
SimiaoZuo / MoEBERT
This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).
☆112Updated 3 years ago
gmftbyGMFTBY / MomentumDecoding
Momentum Decoding: Open-ended Text Generation as Graph Exploration
☆19Updated 2 years ago
rabeehk / hyperformer
☆158Updated 4 years ago