PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
☆1,242Apr 19, 2024Updated 2 years ago
Alternatives and similar repositories for mixture-of-experts
Users that are interested in mixture-of-experts are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models☆859Sep 13, 2023Updated 2 years ago
- A fast MoE impl for PyTorch☆1,846Feb 10, 2025Updated last year
- A collection of AWESOME things about mixture-of-experts☆1,275Dec 8, 2024Updated last year
- ☆717Dec 6, 2025Updated 5 months ago
- Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4☆988Apr 11, 2026Updated 3 weeks ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- A curated reading list of research in Mixture-of-Experts(MoE).☆663Oct 30, 2024Updated last year
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆383Jun 17, 2024Updated last year
- This package implements THOR: Transformer with Stochastic Experts.☆64Oct 7, 2021Updated 4 years ago
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆345Apr 2, 2025Updated last year
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,678Mar 8, 2024Updated 2 years ago
- [NeurIPS 2022] “M³ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design”, Hanxue …☆136Nov 30, 2022Updated 3 years ago
- PyTorch implementation of LIMoE☆52Apr 1, 2024Updated 2 years ago
- Implementation of AAAI 2022 Paper: Go wider instead of deeper☆32Oct 27, 2022Updated 3 years ago
- This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).☆114May 2, 2022Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆145Jul 21, 2024Updated last year
- ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)☆1,001Dec 6, 2024Updated last year
- ☆91Apr 2, 2022Updated 4 years ago
- Fast and memory-efficient exact attention☆23,628May 3, 2026Updated last week
- A TensorFlow Keras implementation of "Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts" (KDD 2018)☆735Mar 25, 2023Updated 3 years ago
- 【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models☆2,314Jul 15, 2025Updated 9 months ago
- Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficien…☆139Apr 13, 2026Updated 3 weeks ago
- Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"☆13,501Dec 17, 2024Updated last year
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities☆22,114Jan 23, 2026Updated 3 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.☆21,052May 1, 2026Updated last week
- Master Thesis. Code written in python. (Keras with Tensorflow backend)☆23Jun 16, 2020Updated 5 years ago
- Train transformer language models with reinforcement learning.☆18,282Updated this week
- Ongoing research training transformer models at scale☆16,253Updated this week
- A Unified Library for Parameter-Efficient and Modular Transfer Learning☆2,812Apr 26, 2026Updated last week
- Mamba SSM architecture☆18,167Updated this week
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…☆9,441Updated this week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.☆42,281Updated this week
- An open source implementation of CLIP.☆13,770Apr 30, 2026Updated last week
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆56Feb 28, 2023Updated 3 years ago
- Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Py…☆25,139May 1, 2026Updated last week
- Code and data for the paper "Multi-Source Domain Adaptation with Mixture of Experts" (EMNLP 2018)☆68Aug 30, 2020Updated 5 years ago
- ☆30Sep 28, 2023Updated 2 years ago
- verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework☆21,046Apr 30, 2026Updated last week
- Transformer related optimization, including BERT, GPT☆6,415Mar 27, 2024Updated 2 years ago
- [TKDE'25] The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".☆491Jul 23, 2025Updated 9 months ago