xyltt / Linear-TransformerLinks

Transformer are RNNs: Fast Autoregressive Transformer with Linear Attention

☆23

Alternatives and similar repositories for Linear-Transformer

Users that are interested in Linear-Transformer are comparing it to the libraries listed below

Sorting:

lancopku / Explicit-Sparse-Transformer
code for Explicit Sparse Transformer
☆61Updated 2 years ago
XueFuzhao / WideNet_Code
Implementation of AAAI 2022 Paper: Go wider instead of deeper
☆32Updated 2 years ago
OpenNLPLab / cosFormer
[ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention
☆195Updated 2 years ago
Somedaywilldo / BM-NAS
BM-NAS: Bilevel Multimodal Neural Architecture Search (AAAI 2022 Oral)
☆19Updated 2 years ago
LibertFan / MAN
Mask Attention Networks: Rethinking and Strengthen Transformer in NAACL2021
☆14Updated 4 years ago
yikangshen / MoA
Mixture of Attention Heads
☆49Updated 3 years ago
haofanwang / awesome-mlp-papers
Recent Advances in MLP-based Models (MLP is all you need!)
☆116Updated 2 years ago
RunxinXu / ContrastivePruning
Source code for our AAAI'22 paper 《From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression》
☆25Updated 3 years ago
lucidrains / FLASH-pytorch
Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"
☆369Updated 2 years ago
Noahs-ARK / RFA
☆33Updated 4 years ago
VITA-Group / ViT-Anti-Oversmoothing
[ICLR 2022] "Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice" by Peihao Wang, Wen…
☆81Updated last year
cmsflash / efficient-attention
An implementation of the efficient attention module.
☆321Updated 4 years ago
DRSY / EMO
[ICLR 2024]EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling(https://arxiv.org/abs/2310.04691)
☆126Updated last year
liuruiyang98 / Jittor-MLP
Unofficial Implementation of MLP-Mixer, gMLP, resMLP, Vision Permutator, S2MLP, S2MLPv2, RaftMLP, HireMLP, ConvMLP, AS-MLP, SparseMLP, Co…
☆169Updated 3 years ago
VITA-Group / UVC
[ICLR 2022] "Unified Vision Transformer Compression" by Shixing Yu*, Tianlong Chen*, Jiayi Shen, Huan Yuan, Jianchao Tan, Sen Yang, Ji Li…
☆53Updated last year
zhihou7 / BatchFormer
CVPR2022, BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning, https://arxiv.org/abs/2203.01522
☆252Updated 2 years ago
szhangtju / The-compression-of-Transformer
☆64Updated 4 years ago
ziyuwwang / DynaMixer
☆27Updated 3 years ago
rishikksh20 / rectified-linear-attention
Sparse Attention with Linear Units
☆19Updated 4 years ago
jaketae / g-mlp
PyTorch implementation of Pay Attention to MLPs
☆40Updated 4 years ago
ml-researcher / diffusion
☆23Updated 3 years ago
thuml / LogME
Code release for "LogME: Practical Assessment of Pre-trained Models for Transfer Learning" (ICML 2021) and Ranking and Tuning Pre-trained…
☆209Updated 2 years ago
kuixu / Linear-Multihead-Attention
Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)
☆75Updated 5 years ago
microsoft / TokenMixers
☆151Updated last year
transformer-vq / transformer_vq
☆196Updated last year
ZhuiyiTechnology / GAU-alpha
基于Gated Attention Unit的Transformer模型（尝鲜版）
☆98Updated 2 years ago
CyberZHG / torch-multi-head-attention
Multi-head attention in PyTorch
☆153Updated 6 years ago
ziplab / LIT
[AAAI 2022] This is the official PyTorch implementation of "Less is More: Pay Less Attention in Vision Transformers"
☆97Updated 3 years ago
Junya-Chen / FlatCLR
FlatNCE: A Novel Contrastive Representation Learning Objective
☆90Updated 3 years ago
lancopku / well-classified-examples-are-underestimated
Code for the AAAI 2022 publication "Well-classified Examples are Underestimated in Classification with Deep Neural Networks"
☆53Updated 3 years ago