erksch / fnet-pytorch
Unofficial PyTorch implementation of Google's FNet: Mixing Tokens with Fourier Transforms. With checkpoints.
☆73Updated 2 years ago
Alternatives and similar repositories for fnet-pytorch:
Users that are interested in fnet-pytorch are comparing it to the libraries listed below
- Implementation of Fast Transformer in Pytorch☆173Updated 3 years ago
- ☆164Updated 2 years ago
- Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).☆225Updated 2 years ago
- Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch☆118Updated 3 years ago
- TF/Keras code for DiffStride, a pooling layer with learnable strides.☆125Updated 3 years ago
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆212Updated 2 years ago
- Sequence Modeling with Structured State Spaces☆63Updated 2 years ago
- Implementations of various linear RNN layers using pytorch and triton☆50Updated last year
- Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)☆60Updated 2 years ago
- Implementation of Agent Attention in Pytorch☆90Updated 8 months ago
- PyTorch implementation of FNet: Mixing Tokens with Fourier transforms☆26Updated 3 years ago
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆99Updated 2 years ago
- Unofficial PyTorch Implementation for pNLP-Mixer: an Efficient all-MLP Architecture for Language (https://arxiv.org/abs/2202.04350)☆63Updated 3 years ago
- Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms☆259Updated 3 years ago
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆204Updated last year
- Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning☆160Updated last year
- Code for the paper PermuteFormer☆42Updated 3 years ago
- Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.☆103Updated 3 years ago
- Relative Positional Encoding for Transformers with Linear Complexity☆62Updated 3 years ago
- Implementation of Nyström Self-attention, from the paper Nyströmformer☆130Updated last week
- Transformers w/o Attention, based fully on MLPs☆93Updated 11 months ago
- An implementation of local windowed attention for language modeling☆431Updated 2 months ago
- Implementation of Linformer for Pytorch☆276Updated last year
- A practical implementation of GradNorm, Gradient Normalization for Adaptive Loss Balancing, in Pytorch☆88Updated last year
- ☆75Updated 4 years ago
- ResiDual: Transformer with Dual Residual Connections, https://arxiv.org/abs/2304.14802☆93Updated last year
- A pytorch realization of adafactor (https://arxiv.org/pdf/1804.04235.pdf )☆23Updated 5 years ago
- Layerwise Batch Entropy Regularization☆22Updated 2 years ago
- Official code for Long Expressive Memory (ICLR 2022, Spotlight)☆69Updated 3 years ago
- ☆73Updated 2 years ago