xyltt / Linear-Transformer
Transformer are RNNs: Fast Autoregressive Transformer with Linear Attention
☆18Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for Linear-Transformer
- code for Explicit Sparse Transformer☆56Updated last year
- [ICLR 2022] "Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice" by Peihao Wang, Wen…☆75Updated 10 months ago
- The accompanying code for "Memory-efficient Transformers via Top-k Attention" (Ankit Gupta, Guy Dar, Shaya Goodman, David Ciprut, Jonatha…☆60Updated 3 years ago
- Code for the AAAI 2022 publication "Well-classified Examples are Underestimated in Classification with Deep Neural Networks"☆43Updated 2 years ago
- PyTorch implementation of Pay Attention to MLPs☆39Updated 3 years ago
- Mixture of Attention Heads☆39Updated 2 years ago
- Mask Attention Networks: Rethinking and Strengthen Transformer in NAACL2021☆15Updated 3 years ago
- [AAAI 2022] This is the official PyTorch implementation of "Less is More: Pay Less Attention in Vision Transformers"☆92Updated 2 years ago
- ☆32Updated 3 years ago
- Official implement of "CAT: Cross Attention in Vision Transformer".☆142Updated 2 years ago
- Recent Advances in MLP-based Models (MLP is all you need!)☆112Updated last year
- An implementation of the efficient attention module.☆284Updated 3 years ago
- [ICLR 2024]EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling(https://arxiv.org/abs/2310.04691)☆114Updated 8 months ago
- ☆27Updated 2 years ago
- For paper《Gaussian Transformer: A Lightweight Approach for Natural Language Inference》☆27Updated 4 years ago
- PyTorch implementation of MLP-Mixer☆36Updated 3 years ago
- 记录Transformer升级的论文笔记☆18Updated last year
- [NeurIPS'21] "Chasing Sparsity in Vision Transformers: An End-to-End Exploration" by Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang…☆90Updated 11 months ago
- Implementation of AAAI 2022 Paper: Go wider instead of deeper☆32Updated 2 years ago
- Implementation for Context-Gated Convolution☆58Updated 3 years ago
- Unofficial Implementation of MLP-Mixer, gMLP, resMLP, Vision Permutator, S2MLP, S2MLPv2, RaftMLP, HireMLP, ConvMLP, AS-MLP, SparseMLP, Co…☆166Updated 2 years ago
- Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)☆73Updated 4 years ago
- Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch☆70Updated 4 years ago
- Learning with Noisy Labels, Label Noise, ICML 2021☆42Updated last year
- FlatNCE: A Novel Contrastive Representation Learning Objective☆87Updated 3 years ago
- Code for "Understanding and Improving Layer Normalization"☆46Updated 4 years ago
- Official implementation for paper "Relational Surrogate Loss Learning", ICLR 2022☆37Updated last year
- custom pytorch implementation of MoCo v3☆44Updated 3 years ago
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling☆79Updated last year
- [ICLR'22] This is an official implementation for "AS-MLP: An Axial Shifted MLP Architecture for Vision".☆124Updated 2 years ago