OpenNLPLab / cosFormer
[ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention
☆185Updated 2 years ago
Alternatives and similar repositories for cosFormer:
Users that are interested in cosFormer are comparing it to the libraries listed below
- Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).☆226Updated 2 years ago
- Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"☆357Updated last year
- Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch☆118Updated 3 years ago
- Implementation of Linformer for Pytorch☆262Updated last year
- An implementation of local windowed attention for language modeling☆403Updated this week
- Recent Advances in MLP-based Models (MLP is all you need!)☆113Updated 2 years ago
- Official code for our CVPR'22 paper “Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space”☆247Updated last year
- custom pytorch implementation of MoCo v3☆45Updated 3 years ago
- [ICML 2021 Oral] We show pure attention suffers rank collapse, and how different mechanisms combat it.☆163Updated 3 years ago
- Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention☆256Updated 3 years ago
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling☆80Updated last year
- Sequence modeling with Mega.☆297Updated last year
- This repository is an implementation for the loss function proposed in https://arxiv.org/pdf/2110.06848.pdf.☆111Updated 3 years ago
- Code release for "LogME: Practical Assessment of Pre-trained Models for Transfer Learning" (ICML 2021) and Ranking and Tuning Pre-trained…☆204Updated last year
- Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms☆256Updated 3 years ago
- Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)☆72Updated 4 years ago
- Transformer based on a variant of attention that is linear complexity in respect to sequence length☆724Updated 8 months ago
- [ICLR 2023] Official implementation of Transnormer in our ICLR 2023 paper - Toeplitz Neural Network for Sequence Modeling☆76Updated 8 months ago
- [EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer☆58Updated last year
- A simple cross attention that updates both the source and target in one step☆161Updated 8 months ago
- An implementation of the efficient attention module.☆297Updated 4 years ago
- Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks, de…☆98Updated 2 years ago
- The pure and clear PyTorch Distributed Training Framework.☆275Updated 11 months ago
- ☆192Updated last year
- ☆244Updated 2 years ago
- A pytorch &keras implementation and demo of Fastformer.☆188Updated 2 years ago
- CVPR2022, BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning, https://arxiv.org/abs/2203.01522☆250Updated last year
- Implement the paper "Self-Attention with Relative Position Representations"☆125Updated 4 years ago
- PyTorch repository for ICLR 2022 paper (GSAM) which improves generalization (e.g. +3.8% top-1 accuracy on ImageNet with ViT-B/32)☆139Updated 2 years ago
- PyTorch codes for "LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning"☆235Updated last year