XueFuzhao/awesome-mixture-of-experts

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/XueFuzhao/awesome-mixture-of-experts)

XueFuzhao / awesome-mixture-of-experts

A collection of AWESOME things about mixture-of-experts

☆1,281

Alternatives and similar repositories for awesome-mixture-of-experts

Users that are interested in awesome-mixture-of-experts are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

codecaution / Awesome-Mixture-of-Experts-Papers
View on GitHub
A curated reading list of research in Mixture-of-Experts(MoE).
☆669Oct 30, 2024Updated last year
XueFuzhao / OpenMoE
View on GitHub
A family of open-sourced Mixture-of-Experts (MoE) Large Language Models
☆1,691Mar 8, 2024Updated 2 years ago
davidmrau / mixture-of-experts
View on GitHub
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
☆1,245Apr 19, 2024Updated 2 years ago
laekov / fastmoe
View on GitHub
A fast MoE impl for PyTorch
☆1,855Feb 10, 2025Updated last year
pjlab-sys4nlp / llama-moe
View on GitHub
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
☆1,004Dec 6, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
lucidrains / mixture-of-experts
View on GitHub
A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models
☆862Sep 13, 2023Updated 2 years ago
microsoft / Tutel
View on GitHub
Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4
☆996Jul 8, 2026Updated last week
google-research / vmoe
View on GitHub
☆725Jul 2, 2026Updated 2 weeks ago
lucidrains / st-moe-pytorch
View on GitHub
Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
☆385Jun 17, 2024Updated 2 years ago
lucidrains / soft-moe-pytorch
View on GitHub
Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch
☆348Apr 2, 2025Updated last year
withinmiaov / A-Survey-on-Mixture-of-Experts-in-LLMs
View on GitHub
[TKDE'25] The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".
☆504Jun 7, 2026Updated last month
YeonwooSung / LIMoE-pytorch
View on GitHub
PyTorch implementation of LIMoE
☆52Apr 1, 2024Updated 2 years ago
Cohere-Labs-Community / parameter-efficient-moe
View on GitHub
☆277Oct 31, 2023Updated 2 years ago
PKU-YuanGroup / MoE-LLaVA
View on GitHub
【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models
☆2,322Jul 15, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
thu-pacman / FasterMoE
View on GitHub
☆92Apr 2, 2022Updated 4 years ago
databricks / megablocks
View on GitHub
☆1,582Mar 25, 2026Updated 3 months ago
fkodom / soft-mixture-of-experts
View on GitHub
PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)
☆83Oct 5, 2023Updated 2 years ago
shawntan / scattermoe
View on GitHub
Triton-based implementation of Sparse Mixture of Experts.
☆281Oct 3, 2025Updated 9 months ago
OpenRLHF / OpenRLHF
View on GitHub
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…
☆9,828Jul 14, 2026Updated last week
BradyFU / Awesome-Multimodal-Large-Language-Models
View on GitHub
Latest Advances on Multimodal Large Language Models
☆17,952Jul 2, 2026Updated 2 weeks ago
Luodian / Generalizable-Mixture-of-Experts
View on GitHub
GMoE could be the next backbone model for many kinds of generalization task.
☆275Mar 21, 2023Updated 3 years ago
allenai / OLMoE
View on GitHub
OLMoE: Open Mixture-of-Experts Language Models
☆1,041Sep 23, 2025Updated 9 months ago
UNITES-Lab / MC-SMoE
View on GitHub
[ICLR‘24 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"
☆108Jun 20, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
deepseek-ai / DeepSeek-MoE
View on GitHub
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
☆1,951Jan 16, 2024Updated 2 years ago
Xnhyacinth / Awesome-LLM-Long-Context-Modeling
View on GitHub
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
☆2,146Jul 1, 2026Updated 2 weeks ago
FranxYao / chain-of-thought-hub
View on GitHub
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
☆2,777Aug 4, 2024Updated last year
Dao-AILab / flash-attention
View on GitHub
Fast and memory-efficient exact attention
☆24,497Updated this week
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆22,571Updated this week
NVIDIA / Megatron-LM
View on GitHub
Ongoing research training transformer models at scale
☆17,125Updated this week
VITA-Group / Random-MoE-as-Dropout
View on GitHub
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆56Feb 28, 2023Updated 3 years ago
deepspeedai / Megatron-DeepSpeed
View on GitHub
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆2,257Aug 14, 2025Updated 11 months ago
EleutherAI / lm-evaluation-harness
View on GitHub
A framework for few-shot evaluation of language models.
☆13,340Jul 13, 2026Updated last week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
HuangOwen / Awesome-LLM-Compression
View on GitHub
Awesome LLM compression research papers and tools.
☆1,853Jun 30, 2026Updated 2 weeks ago
fla-org / flash-linear-attention
View on GitHub
🚀 Efficient implementations for emerging model architectures
☆5,379Updated this week
LINs-lab / DynMoE
View on GitHub
[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
☆161Jul 9, 2025Updated last year
koayon / awesome-adaptive-computation
View on GitHub
A curated reading list of research in Adaptive Computation, Inference-Time Computation & Mixture of Experts (MoE).
☆163Jan 1, 2025Updated last year
xlite-dev / Awesome-LLM-Inference
View on GitHub
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
☆5,404Jun 23, 2026Updated 3 weeks ago
allenai / open-instruct
View on GitHub
AllenAI's post-training codebase
☆3,801Updated this week
microsoft / Stochastic-Mixture-of-Experts
View on GitHub
This package implements THOR: Transformer with Stochastic Experts.
☆64Oct 7, 2021Updated 4 years ago