PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
☆1,246Apr 19, 2024Updated 2 years ago
Alternatives and similar repositories for mixture-of-experts
Users that are interested in mixture-of-experts are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models☆862Sep 13, 2023Updated 2 years ago
- A fast MoE impl for PyTorch☆1,855Feb 10, 2025Updated last year
- A collection of AWESOME things about mixture-of-experts☆1,280Dec 8, 2024Updated last year
- ☆725Jun 6, 2026Updated last week
- Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4☆993Jun 4, 2026Updated 2 weeks ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A curated reading list of research in Mixture-of-Experts(MoE).☆667Oct 30, 2024Updated last year
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆385Jun 17, 2024Updated 2 years ago
- This package implements THOR: Transformer with Stochastic Experts.☆64Oct 7, 2021Updated 4 years ago
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆347Apr 2, 2025Updated last year
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,687Mar 8, 2024Updated 2 years ago
- [NeurIPS 2022] “M³ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design”, Hanxue …☆136Nov 30, 2022Updated 3 years ago
- PyTorch implementation of LIMoE☆52Apr 1, 2024Updated 2 years ago
- Implementation of AAAI 2022 Paper: Go wider instead of deeper☆32Oct 27, 2022Updated 3 years ago
- This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).☆114May 2, 2022Updated 4 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- ☆145Jul 21, 2024Updated last year
- ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)☆1,003Dec 6, 2024Updated last year
- ☆92Apr 2, 2022Updated 4 years ago
- Fast and memory-efficient exact attention☆24,170Updated this week
- 【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models☆2,321Jul 15, 2025Updated 11 months ago
- Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficien…☆142May 11, 2026Updated last month
- Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"☆13,590Dec 17, 2024Updated last year
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities☆22,149Jan 23, 2026Updated 4 months ago
- 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.☆21,273Updated this week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A Unified Library for Parameter-Efficient and Modular Transfer Learning☆2,814Apr 26, 2026Updated last month
- Train transformer language models with reinforcement learning.☆18,663Updated this week
- Ongoing research training transformer models at scale☆16,687Updated this week
- Mamba SSM architecture☆18,455Updated this week
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…☆9,652Jun 9, 2026Updated last week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.☆42,508Updated this week
- An open source implementation of CLIP.☆13,907Jun 11, 2026Updated last week
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆56Feb 28, 2023Updated 3 years ago
- Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Py…☆25,321Updated this week
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Code and data for the paper "Multi-Source Domain Adaptation with Mixture of Experts" (EMNLP 2018)☆68Aug 30, 2020Updated 5 years ago
- ☆30Sep 28, 2023Updated 2 years ago
- verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework☆21,969Updated this week
- Transformer related optimization, including BERT, GPT☆6,422Mar 27, 2024Updated 2 years ago
- [TKDE'25] The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".☆505Jun 7, 2026Updated last week
- Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"☆8,628May 31, 2024Updated 2 years ago
- 🚀 Efficient implementations for emerging model architectures☆5,227Jun 11, 2026Updated last week