alexisrozhkov / dilated-self-attention

Implementation of the dilated self attention as described in "LongNet: Scaling Transformers to 1,000,000,000 Tokens"
13Updated last year

Related projects

Alternatives and complementary repositories for dilated-self-attention