zhixuan-lin / forgetting-transformer

[ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"
97Updated 3 weeks ago

Alternatives and similar repositories for forgetting-transformer:

Users that are interested in forgetting-transformer are comparing it to the libraries listed below