lucidrains / coordinate-descent-attentionView on GitHub
Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk
47Jul 16, 2023Updated 2 years ago

Alternatives and similar repositories for coordinate-descent-attention

Users that are interested in coordinate-descent-attention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Are these results useful?