Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4
☆965Dec 21, 2025Updated 2 months ago
Alternatives and similar repositories for Tutel
Users that are interested in Tutel are comparing it to the libraries listed below
Sorting:
- A fast MoE impl for PyTorch☆1,840Feb 10, 2025Updated last year
- ☆707Dec 6, 2025Updated 2 months ago
- PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538☆1,231Apr 19, 2024Updated last year
- Microsoft Collective Communication Library☆384Sep 20, 2023Updated 2 years ago
- A collection of AWESOME things about mixture-of-experts☆1,266Dec 8, 2024Updated last year
- PyTorch extensions for high performance and large scale training.☆3,400Apr 26, 2025Updated 10 months ago
- Training and serving large-scale neural networks with auto parallelization.☆3,183Dec 9, 2023Updated 2 years ago
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,660Mar 8, 2024Updated last year
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…☆3,170Feb 21, 2026Updated last week
- ☆89Apr 2, 2022Updated 3 years ago
- Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training☆1,861Feb 20, 2026Updated last week
- Compiler for Dynamic Neural Networks☆45Nov 13, 2023Updated 2 years ago
- Transformer related optimization, including BERT, GPT☆6,394Mar 27, 2024Updated last year
- Ongoing research training transformer models at scale☆15,242Feb 21, 2026Updated last week
- Triton-based implementation of Sparse Mixture of Experts.☆266Oct 3, 2025Updated 4 months ago
- A curated reading list of research in Mixture-of-Experts(MoE).☆660Oct 30, 2024Updated last year
- ☆115Aug 26, 2024Updated last year
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆2,229Aug 14, 2025Updated 6 months ago
- Synthesizer for optimal collective communication algorithms☆124Apr 8, 2024Updated last year
- Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs☆938Nov 27, 2025Updated 3 months ago
- Microsoft Automatic Mixed Precision Library☆636Dec 1, 2025Updated 2 months ago
- Ring attention implementation with flash attention☆986Sep 10, 2025Updated 5 months ago
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆1,261Aug 28, 2025Updated 6 months ago
- This package implements THOR: Transformer with Stochastic Experts.☆65Oct 7, 2021Updated 4 years ago
- ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)☆1,001Dec 6, 2024Updated last year
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆469Feb 21, 2026Updated last week
- nnScaler: Compiling DNN models for Parallel Training☆124Sep 23, 2025Updated 5 months ago
- Distributed Compiler based on Triton for Parallel Systems☆1,361Feb 13, 2026Updated 2 weeks ago
- Fast and memory-efficient exact attention☆22,361Updated this week
- Large Context Attention☆769Oct 13, 2025Updated 4 months ago
- Development repository for the Triton language and compiler☆18,460Feb 22, 2026Updated last week
- A Python-level JIT compiler designed to make unmodified PyTorch programs faster.☆1,075Apr 17, 2024Updated last year
- An experimental parallel training platform☆56Mar 25, 2024Updated last year
- Hackable and optimized Transformers building blocks, supporting a composable construction.☆10,353Feb 20, 2026Updated last week
- DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models☆1,893Jan 16, 2024Updated 2 years ago
- FlashInfer: Kernel Library for LLM Serving☆5,009Updated this week
- A PyTorch native platform for training generative AI models☆5,098Updated this week
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆1,436Mar 20, 2024Updated last year
- ☆145Jan 30, 2025Updated last year