xyltt / Linear-TransformerLinks
Transformer are RNNs: Fast Autoregressive Transformer with Linear Attention
☆24Updated 4 years ago
Alternatives and similar repositories for Linear-Transformer
Users that are interested in Linear-Transformer are comparing it to the libraries listed below
Sorting:
- code for Explicit Sparse Transformer☆61Updated 2 years ago
- [ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention☆196Updated 3 years ago
- Implementation of AAAI 2022 Paper: Go wider instead of deeper☆32Updated 3 years ago
- Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"☆371Updated 2 years ago
- Mixture of Attention Heads☆51Updated 3 years ago
- [ICLR 2022] "Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice" by Peihao Wang, Wen…☆82Updated last year
- PyTorch implementation of Pay Attention to MLPs☆40Updated 4 years ago
- Recent Advances in MLP-based Models (MLP is all you need!)☆117Updated 3 years ago
- ☆27Updated 3 years ago
- An implementation of the efficient attention module.☆327Updated 5 years ago
- ☆200Updated 2 years ago
- ☆33Updated 4 years ago
- ☆64Updated 5 years ago
- [ICLR 2024]EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling(https://arxiv.org/abs/2310.04691)☆128Updated last year
- Source code for our AAAI'22 paper 《From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression》☆25Updated 4 years ago
- Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)☆75Updated 5 years ago
- ☆22Updated 2 years ago
- The accompanying code for "Memory-efficient Transformers via Top-k Attention" (Ankit Gupta, Guy Dar, Shaya Goodman, David Ciprut, Jonatha…☆71Updated 4 years ago
- Code release for "LogME: Practical Assessment of Pre-trained Models for Transfer Learning" (ICML 2021) and Ranking and Tuning Pre-trained…☆211Updated 2 years ago
- [AAAI 2022] This is the official PyTorch implementation of "Less is More: Pay Less Attention in Vision Transformers"☆97Updated 3 years ago
- A Tight-fisted Optimizer (Tiger), implemented in PyTorch.☆12Updated last year
- [ICLR 2022] "Unified Vision Transformer Compression" by Shixing Yu*, Tianlong Chen*, Jiayi Shen, Huan Yuan, Jianchao Tan, Sen Yang, Ji Li…☆55Updated 2 years ago
- Code for the AAAI 2022 publication "Well-classified Examples are Underestimated in Classification with Deep Neural Networks"☆54Updated 3 years ago
- Learning with Noisy Labels, Label Noise, ICML 2021☆46Updated 2 years ago
- [EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer☆64Updated 2 years ago
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling☆87Updated 2 years ago
- 基于Gated Attention Unit的Transformer模型(尝鲜版)☆98Updated 2 years ago
- Learning to Encode Position for Transformer with Continuous Dynamical Model☆59Updated 5 years ago
- iFormer: Inception Transformer☆247Updated 2 years ago
- A pytorch &keras implementation and demo of Fastformer.☆191Updated 3 years ago