lucidrains / mixture-of-attention

Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
108Updated last month

Related projects

Alternatives and complementary repositories for mixture-of-attention