lucidrains / coordinate-descent-attention

Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk
46Updated last year

Alternatives and similar repositories for coordinate-descent-attention:

Users that are interested in coordinate-descent-attention are comparing it to the libraries listed below