microsoft / SparseMixer

Sparse Backpropagation for Mixture-of-Expert Training
17Updated 2 months ago

Related projects: