piotrpiekos / MoSA

User-friendly implementation of the Mixture-of-Sparse-Attention (MoSA). MoSA selects distinct tokens for each head with expert choice routing providing a content-based sparse attention mechanism.
10Updated last week

Alternatives and similar repositories for MoSA:

Users that are interested in MoSA are comparing it to the libraries listed below