zhixuan-lin / forgetting-transformerLinks

[ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"
104Updated 2 weeks ago

Alternatives and similar repositories for forgetting-transformer

Users that are interested in forgetting-transformer are comparing it to the libraries listed below

Sorting: