IBM / ModuleFormerView on GitHub
ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward experts. We released a collection of ModuleFormer-based Language Models (MoLM) ranging in scale from 4 billion to 8 billion parameters.
226Sep 18, 2025Updated 5 months ago

Alternatives and similar repositories for ModuleFormer

Users that are interested in ModuleFormer are comparing it to the libraries listed below

Sorting:

Are these results useful?