zhixuan-lin / forgetting-transformer

[ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"
76Updated last week

Alternatives and similar repositories for forgetting-transformer:

Users that are interested in forgetting-transformer are comparing it to the libraries listed below