PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
☆1,240Apr 19, 2024Updated last year
Alternatives and similar repositories for mixture-of-experts
Users that are interested in mixture-of-experts are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models☆848Sep 13, 2023Updated 2 years ago
- A fast MoE impl for PyTorch☆1,845Feb 10, 2025Updated last year
- A collection of AWESOME things about mixture-of-experts☆1,272Dec 8, 2024Updated last year
- ☆713Dec 6, 2025Updated 3 months ago
- Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4☆981Updated this week
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- A curated reading list of research in Mixture-of-Experts(MoE).☆662Oct 30, 2024Updated last year
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆379Jun 17, 2024Updated last year
- This package implements THOR: Transformer with Stochastic Experts.☆64Oct 7, 2021Updated 4 years ago
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆344Apr 2, 2025Updated 11 months ago
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,672Mar 8, 2024Updated 2 years ago
- [NeurIPS 2022] “M³ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design”, Hanxue …☆136Nov 30, 2022Updated 3 years ago
- PyTorch implementation of LIMoE☆52Apr 1, 2024Updated last year
- Implementation of AAAI 2022 Paper: Go wider instead of deeper☆32Oct 27, 2022Updated 3 years ago
- This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).☆114May 2, 2022Updated 3 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆145Jul 21, 2024Updated last year
- ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)☆1,000Dec 6, 2024Updated last year
- ☆89Apr 2, 2022Updated 3 years ago
- Fast and memory-efficient exact attention☆22,938Mar 23, 2026Updated last week
- 【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models☆2,310Jul 15, 2025Updated 8 months ago
- A TensorFlow Keras implementation of "Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts" (KDD 2018)☆735Mar 25, 2023Updated 3 years ago
- Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficien…☆138Mar 13, 2026Updated 2 weeks ago
- Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"☆13,366Dec 17, 2024Updated last year
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities☆22,069Jan 23, 2026Updated 2 months ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.☆20,841Mar 18, 2026Updated last week
- Train transformer language models with reinforcement learning.☆17,781Updated this week
- Master Thesis. Code written in python. (Keras with Tensorflow backend)☆23Jun 16, 2020Updated 5 years ago
- Ongoing research training transformer models at scale☆15,827Updated this week
- A Unified Library for Parameter-Efficient and Modular Transfer Learning☆2,806Mar 21, 2026Updated last week
- Mamba SSM architecture☆17,725Updated this week
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)☆9,231Mar 24, 2026Updated last week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.☆41,925Updated this week
- An open source implementation of CLIP.☆13,579Mar 12, 2026Updated 2 weeks ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆56Feb 28, 2023Updated 3 years ago
- Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Py…☆24,996Updated this week
- Code and data for the paper "Multi-Source Domain Adaptation with Mixture of Experts" (EMNLP 2018)☆68Aug 30, 2020Updated 5 years ago
- verl: Volcano Engine Reinforcement Learning for LLMs☆20,286Updated this week
- ☆30Sep 28, 2023Updated 2 years ago
- 🚀 Efficient implementations of state-of-the-art linear attention models☆4,692Updated this week
- Transformer related optimization, including BERT, GPT☆6,405Mar 27, 2024Updated 2 years ago