PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
☆1,243Apr 19, 2024Updated 2 years ago
Alternatives and similar repositories for mixture-of-experts
Users that are interested in mixture-of-experts are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models☆854Sep 13, 2023Updated 2 years ago
- A fast MoE impl for PyTorch☆1,849Feb 10, 2025Updated last year
- A collection of AWESOME things about mixture-of-experts☆1,274Dec 8, 2024Updated last year
- ☆715Dec 6, 2025Updated 4 months ago
- Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4☆984Apr 11, 2026Updated last week
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- A curated reading list of research in Mixture-of-Experts(MoE).☆663Oct 30, 2024Updated last year
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆381Jun 17, 2024Updated last year
- This package implements THOR: Transformer with Stochastic Experts.☆64Oct 7, 2021Updated 4 years ago
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆345Apr 2, 2025Updated last year
- PyTorch implementation of moe, which stands for mixture of experts☆53Feb 11, 2021Updated 5 years ago
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,675Mar 8, 2024Updated 2 years ago
- [NeurIPS 2022] “M³ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design”, Hanxue …☆136Nov 30, 2022Updated 3 years ago
- PyTorch implementation of LIMoE☆52Apr 1, 2024Updated 2 years ago
- Implementation of AAAI 2022 Paper: Go wider instead of deeper☆32Oct 27, 2022Updated 3 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).☆114May 2, 2022Updated 3 years ago
- ☆145Jul 21, 2024Updated last year
- ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)☆1,000Dec 6, 2024Updated last year
- ☆89Apr 2, 2022Updated 4 years ago
- Fast and memory-efficient exact attention☆23,344Updated this week
- 【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models☆2,314Jul 15, 2025Updated 9 months ago
- A TensorFlow Keras implementation of "Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts" (KDD 2018)☆736Mar 25, 2023Updated 3 years ago
- Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficien…☆138Apr 13, 2026Updated last week
- Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"☆13,435Dec 17, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities☆22,086Jan 23, 2026Updated 2 months ago
- 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.☆20,929Apr 10, 2026Updated last week
- Master Thesis. Code written in python. (Keras with Tensorflow backend)☆23Jun 16, 2020Updated 5 years ago
- Train transformer language models with reinforcement learning.☆18,054Updated this week
- Ongoing research training transformer models at scale☆16,073Updated this week
- A Unified Library for Parameter-Efficient and Modular Transfer Learning☆2,810Mar 21, 2026Updated 3 weeks ago
- Mamba SSM architecture☆17,979Updated this week
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…☆9,340Updated this week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.☆42,141Updated this week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- An open source implementation of CLIP.☆13,695Apr 6, 2026Updated last week
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆56Feb 28, 2023Updated 3 years ago
- Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Py…☆25,043Apr 6, 2026Updated last week
- ☆30Sep 28, 2023Updated 2 years ago
- verl: Volcano Engine Reinforcement Learning for LLMs☆20,789Updated this week
- 🚀 Efficient implementations for emerging model architectures☆4,878Updated this week
- Transformer related optimization, including BERT, GPT☆6,412Mar 27, 2024Updated 2 years ago