maum-ai / pnlp-mixer
Unofficial PyTorch Implementation for pNLP-Mixer: an Efficient all-MLP Architecture for Language (https://arxiv.org/abs/2202.04350)
☆63Updated 3 years ago
Alternatives and similar repositories for pnlp-mixer
Users that are interested in pnlp-mixer are comparing it to the libraries listed below
Sorting:
- Implementation of Fast Transformer in Pytorch☆173Updated 3 years ago
- Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch☆73Updated 2 years ago
- Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch☆118Updated 3 years ago
- A PyTorch Implementation of the Luna: Linear Unified Nested Attention☆41Updated 3 years ago
- Axial Positional Embedding for Pytorch☆79Updated 2 months ago
- Implementation of RQ Transformer, proposed in the paper "Autoregressive Image Generation using Residual Quantization"☆107Updated 3 years ago
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆100Updated 2 years ago
- Sequence modeling with Mega.☆295Updated 2 years ago
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆203Updated last year
- Official code for Wav2Seq☆96Updated 2 years ago
- Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).☆225Updated 3 years ago
- Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch☆227Updated 8 months ago
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆213Updated 2 years ago
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆50Updated 3 years ago
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆118Updated 6 months ago
- Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"☆70Updated 2 years ago
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆49Updated 3 years ago
- A variant of Transformer-XL where the memory is updated not with a queue, but with attention☆48Updated 4 years ago
- ☆64Updated 8 months ago
- Implementation of RealFormer using pytorch☆100Updated 4 years ago
- ☆164Updated 2 years ago
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆37Updated 3 years ago
- Relative Positional Encoding for Transformers with Linear Complexity☆63Updated 3 years ago
- [ICLR 2023] Official implementation of Transnormer in our ICLR 2023 paper - Toeplitz Neural Network for Sequence Modeling☆79Updated last year
- PyTorch implementation of RNN-Transducer(RNN-T).☆75Updated 4 years ago
- Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"☆57Updated last year
- My explorations into editing the knowledge and memories of an attention network☆34Updated 2 years ago
- Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto☆55Updated 11 months ago
- Implementation of a Light Recurrent Unit in Pytorch☆46Updated 7 months ago
- ResiDual: Transformer with Dual Residual Connections, https://arxiv.org/abs/2304.14802☆93Updated last year