thu-ml / ReMoE

[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
β˜†65Updated 3 months ago

Alternatives and similar repositories for ReMoE:

Users that are interested in ReMoE are comparing it to the libraries listed below