xyltt / Linear-TransformerLinks
Transformer are RNNs: Fast Autoregressive Transformer with Linear Attention
☆24Updated 5 years ago
Alternatives and similar repositories for Linear-Transformer
Users that are interested in Linear-Transformer are comparing it to the libraries listed below
Sorting:
- code for Explicit Sparse Transformer☆61Updated 2 years ago
- Implementation of AAAI 2022 Paper: Go wider instead of deeper☆32Updated 3 years ago
- Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"☆372Updated 2 years ago
- [ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention☆198Updated 3 years ago
- Recent Advances in MLP-based Models (MLP is all you need!)☆116Updated 3 years ago
- ☆65Updated 5 years ago
- An implementation of the efficient attention module.☆328Updated 5 years ago
- [ICLR 2022] "Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice" by Peihao Wang, Wen…☆83Updated 2 years ago
- Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)☆75Updated 5 years ago
- BM-NAS: Bilevel Multimodal Neural Architecture Search (AAAI 2022 Oral)☆19Updated 3 years ago
- ☆27Updated 3 years ago
- PyTorch implementation of Pay Attention to MLPs☆40Updated 4 years ago
- An official codebase of paper "Revisiting Sparse Convolutional Model for Visual Recognition"☆125Updated 2 years ago
- Mixture of Attention Heads☆51Updated 3 years ago
- Multi-head attention in PyTorch☆156Updated 6 years ago
- [ICLR 2024]EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling(https://arxiv.org/abs/2310.04691)☆128Updated last year
- Unofficial Implementation of MLP-Mixer, gMLP, resMLP, Vision Permutator, S2MLP, S2MLPv2, RaftMLP, HireMLP, ConvMLP, AS-MLP, SparseMLP, Co…☆170Updated 3 years ago
- iFormer: Inception Transformer☆247Updated 3 years ago
- ☆201Updated 2 years ago
- 基于Gated Attention Unit的Transformer模型(尝鲜版)☆98Updated 2 years ago
- [ICLR 2022] "Unified Vision Transformer Compression" by Shixing Yu*, Tianlong Chen*, Jiayi Shen, Huan Yuan, Jianchao Tan, Sen Yang, Ji Li…☆55Updated 2 years ago
- Mask Attention Networks: Rethinking and Strengthen Transformer in NAACL2021☆14Updated 4 years ago
- For paper《Gaussian Transformer: A Lightweight Approach for Natural Language Inference》☆28Updated 5 years ago
- Simple tutorials on Pytorch DDP training☆286Updated 3 years ago
- [ICLR 2021 top 3%] Is Attention Better Than Matrix Decomposition?☆341Updated 3 years ago
- ☆33Updated 4 years ago
- FlatNCE: A Novel Contrastive Representation Learning Objective☆90Updated 4 years ago
- This repository is an implementation for the loss function proposed in https://arxiv.org/pdf/2110.06848.pdf.☆117Updated 4 years ago
- [AAAI 2022] This is the official PyTorch implementation of "Less is More: Pay Less Attention in Vision Transformers"☆97Updated 3 years ago
- [NeurIPS 2023] The PyTorch Implementation of Scheduled (Stable) Weight Decay.☆62Updated 2 years ago