JerryYin777 / Cross-Layer-AttentionLinks

Self Reproduction Code of Paper "Reducing Transformer Key-Value Cache Size with Cross-Layer Attention (MIT CSAIL)
16Updated last year

Alternatives and similar repositories for Cross-Layer-Attention

Users that are interested in Cross-Layer-Attention are comparing it to the libraries listed below

Sorting: